Skip to main content

One post tagged with "yaml"

View All Tags

Patrick DeVivo

MergeStat's mission is to make it possible to query anything involved in building & shipping software with SQL. Recently, we had a user looking to query YAML files across their git repos, wanting to extract and aggregate values from config files.

We thought for a bit, and came up with a solution. An odd one, but one that works surprisingly well 😃.

How it works

  1. Ensure the PLV8 extension is installed in Postgres. This allows us to run JavaScript in the database.
  2. Create a user-defined function plv8_yaml_to_json() that uses an existing JavaScript YAML parser, to convert YAML strings into JSON.
  3. Write and execute SQL that uses the new function and the native Postgres JSON operators to query what you're looking for 🎉.

Here it is altogether!

What can we query?

Well, to showcase this, let's look at a git repo with a lot of YAML. One good example is the public bitnami/charts repo, which is a collection of Helm Charts for popular OSS applications to run on Kubernetes.

List all Helm charts (and their annotated category)

WITH bitnami_charts AS (
SELECT path, plv8_yaml_to_json(contents) AS chart FROM git_files
JOIN repos ON git_files.repo_id = repos.id
WHERE repo = 'https://github.com/bitnami/charts'
AND path LIKE 'bitnami/%/Chart.yaml'
)
SELECT
chart->>'name' AS chart_name,
chart->>'description' AS chart_description,
chart->'annotations'->>'category' AS category
FROM bitnami_charts
chart_namechart_descriptioncategory
airflowApache Airflow is a tool to express and execute workflows as directed acyclic graphs (DAGs). It includes utilities to schedule tasks, monitor task progress and handle task dependencies.WorkFlow
apacheApache HTTP Server is an open-source HTTP server. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards.Infrastructure
appsmithAppsmith is an open source platform for building and maintaining internal tools, such as custom dashboards, admin panels or CRUD apps.CMS
argo-cdArgo CD is a continuous delivery tool for Kubernetes based on GitOps.Infrastructure
argo-workflowsArgo Workflows is meant to orchestrate Kubernetes jobs in parallel. It uses DAG and step-based workflowsInfrastructure
...

How many charts are in each category?

WITH bitnami_charts AS (
SELECT path, plv8_yaml_to_json(contents) AS chart FROM git_files
JOIN repos ON git_files.repo_id = repos.id
WHERE repo = 'https://github.com/bitnami/charts'
AND path LIKE 'bitnami/%/Chart.yaml'
)
SELECT
count(*),
chart->'annotations'->>'category' AS category
FROM bitnami_charts
GROUP BY category
ORDER BY count(*) DESC

Bitnami chart count by category

Hopefully that gives you a sense of what's possible chaining together YAML, MergeStat, JavaScript, and Postgres!

Join our Slack

If you found this interesting, hop in our community Slack! We're always happy to chat about MergeStat there 🎉.