MergeStat's mission is to make it possible to query anything involved in building & shipping software with SQL. Recently, we had a user looking to query YAML files across their git repos, wanting to extract and aggregate values from config files.
We thought for a bit, and came up with a solution. An odd one, but one that works surprisingly well ๐.
How it worksโ
- Ensure the
PLV8
extension is installed in Postgres. This allows us to run JavaScript in the database. - Create a user-defined function
plv8_yaml_to_json()
that uses an existing JavaScript YAML parser, to convert YAML strings into JSON. - Write and execute SQL that uses the new function and the native Postgres JSON operators to query what you're looking for ๐.
Here it is altogether!
What can we query?โ
Well, to showcase this, let's look at a git repo with a lot of YAML.
One good example is the public bitnami/charts
repo, which is a collection of Helm Charts for popular OSS applications to run on Kubernetes.
List all Helm charts (and their annotated category)โ
WITH bitnami_charts AS (
SELECT path, plv8_yaml_to_json(contents) AS chart FROM git_files
JOIN repos ON git_files.repo_id = repos.id
WHERE repo = 'https://github.com/bitnami/charts'
AND path LIKE 'bitnami/%/Chart.yaml'
)
SELECT
chart->>'name' AS chart_name,
chart->>'description' AS chart_description,
chart->'annotations'->>'category' AS category
FROM bitnami_charts
chart_name | chart_description | category |
---|---|---|
airflow | Apache Airflow is a tool to express and execute workflows as directed acyclic graphs (DAGs). It includes utilities to schedule tasks, monitor task progress and handle task dependencies. | WorkFlow |
apache | Apache HTTP Server is an open-source HTTP server. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards. | Infrastructure |
appsmith | Appsmith is an open source platform for building and maintaining internal tools, such as custom dashboards, admin panels or CRUD apps. | CMS |
argo-cd | Argo CD is a continuous delivery tool for Kubernetes based on GitOps. | Infrastructure |
argo-workflows | Argo Workflows is meant to orchestrate Kubernetes jobs in parallel. It uses DAG and step-based workflows | Infrastructure |
... |
How many charts are in each category?โ
WITH bitnami_charts AS (
SELECT path, plv8_yaml_to_json(contents) AS chart FROM git_files
JOIN repos ON git_files.repo_id = repos.id
WHERE repo = 'https://github.com/bitnami/charts'
AND path LIKE 'bitnami/%/Chart.yaml'
)
SELECT
count(*),
chart->'annotations'->>'category' AS category
FROM bitnami_charts
GROUP BY category
ORDER BY count(*) DESC
Hopefully that gives you a sense of what's possible chaining together YAML, MergeStat, JavaScript, and Postgres!
If you found this interesting, hop in our community Slack! We're always happy to chat about MergeStat there ๐.