Skip to main content

2 posts tagged with "postgres"

View All Tags

Patrick DeVivo

MergeStat's mission is to make it possible to query anything involved in building & shipping software with SQL. Recently, we had a user looking to query YAML files across their git repos, wanting to extract and aggregate values from config files.

We thought for a bit, and came up with a solution. An odd one, but one that works surprisingly well ๐Ÿ˜ƒ.

How it worksโ€‹

  1. Ensure the PLV8 extension is installed in Postgres. This allows us to run JavaScript in the database.
  2. Create a user-defined function plv8_yaml_to_json() that uses an existing JavaScript YAML parser, to convert YAML strings into JSON.
  3. Write and execute SQL that uses the new function and the native Postgres JSON operators to query what you're looking for ๐ŸŽ‰.

Here it is altogether!

What can we query?โ€‹

Well, to showcase this, let's look at a git repo with a lot of YAML. One good example is the public bitnami/charts repo, which is a collection of Helm Charts for popular OSS applications to run on Kubernetes.

List all Helm charts (and their annotated category)โ€‹

WITH bitnami_charts AS (
SELECT path, plv8_yaml_to_json(contents) AS chart FROM git_files
JOIN repos ON git_files.repo_id = repos.id
WHERE repo = 'https://github.com/bitnami/charts'
AND path LIKE 'bitnami/%/Chart.yaml'
)
SELECT
chart->>'name' AS chart_name,
chart->>'description' AS chart_description,
chart->'annotations'->>'category' AS category
FROM bitnami_charts
chart_namechart_descriptioncategory
airflowApache Airflow is a tool to express and execute workflows as directed acyclic graphs (DAGs). It includes utilities to schedule tasks, monitor task progress and handle task dependencies.WorkFlow
apacheApache HTTP Server is an open-source HTTP server. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards.Infrastructure
appsmithAppsmith is an open source platform for building and maintaining internal tools, such as custom dashboards, admin panels or CRUD apps.CMS
argo-cdArgo CD is a continuous delivery tool for Kubernetes based on GitOps.Infrastructure
argo-workflowsArgo Workflows is meant to orchestrate Kubernetes jobs in parallel. It uses DAG and step-based workflowsInfrastructure
...

How many charts are in each category?โ€‹

WITH bitnami_charts AS (
SELECT path, plv8_yaml_to_json(contents) AS chart FROM git_files
JOIN repos ON git_files.repo_id = repos.id
WHERE repo = 'https://github.com/bitnami/charts'
AND path LIKE 'bitnami/%/Chart.yaml'
)
SELECT
count(*),
chart->'annotations'->>'category' AS category
FROM bitnami_charts
GROUP BY category
ORDER BY count(*) DESC

Bitnami chart count by category

Hopefully that gives you a sense of what's possible chaining together YAML, MergeStat, JavaScript, and Postgres!

Join our Slack

If you found this interesting, hop in our community Slack! We're always happy to chat about MergeStat there ๐ŸŽ‰.

Patrick DeVivo

Today we're very happy to announce a new flavor of MergeStat! If you've been following our progress over the last year, you'll know that our mission is to enable operational analytics for software engineering teams. You'll also know that our approach has been heavily based on SQLite (and its virtual table mechanism) to bring data from git repositories (source code, metadata, the GitHub API, etc) into a SQL context for flexible querying and analytics.

MergeStat Management Console Illustration

This has enabled initial use cases in many interesting domains related to the software-development-lifecycle, from audit and compliance, to code querying, to release time metrics, to technical debt surfacing, and much, much more.

For the past several months, weโ€™ve been working on an approach in service of this mission that uses a different mechanism for enabling that SQL context. That mechanism is based on syncing data into a Postgres database for downstream querying, vs using a local SQLite based approach.

This grants us two significant advantages:

  1. More compatibility with the ecosystem of data (and BI/visualization) tools
  2. Much faster query time for data spread across multiple API sources

More Compatibilityโ€‹

We love SQLite, but the fact of the matter is that itโ€™s much easier to integrate a data visualization or BI product (open-source or not) with a Postgres server (or Postgres compatible server) than with a SQLite database file. A central part of our mission has always been not just enabling SQL, but allowing our SQL solution to play well with the wide array of existing tools that companies, teams and individuals are already using.

We want anyone to be able to query MergeStat data from tools like Metabase, Grafana, Tableau, Superset, etc. Postgres compatibility takes us a step in that direction, and weโ€™re already working with early users integrating their SDLC data with these types of tools, via MergeStat.

Faster Queriesโ€‹

Our SQLite virtual-table based approach is an effective way to query a local data source (such as a git repository on disk). However, it begins to fall short when it comes to querying over large sets of data spread across many pages in a web API (or multiple APIs). What ends up happening is queries spend much more time waiting for HTTP requests to finish than actually executing SQL. This means at query time, the user is forced to wait for data collection to occur. Depending on the scope of the data involved, this can potentially take a very long time. With our new approach, data is synced and maintained in a background process, so that data collection (from potentially slow sources) occurs out-of-band from user queries.

Whatโ€™s Next?โ€‹

Our SQLite approach isnโ€™t going anywhere. We still intend to invest in that and continue building it into a valuable open-source CLI that anyone can use. Itโ€™s been renamed MergeStat Lite and now lives at mergestat/mergestat-lite.

Our new approach now lives at mergestat/mergestat. Our documentation has been updated to reflect these changes, and we couldnโ€™t be more excited to share more in the coming days and weeks.

We have some great features planned, and will be spending more time showcasing various use-cases from our growing community.

Join our Slack

As usual, our community Slack is a great place to find help and ask questions. We're always happy to chat about MergeStat there ๐ŸŽ‰!