Skip to main content

4 posts tagged with "sql"

View All Tags

Patrick DeVivo

We’re proud to announce that MergeStat has raised over $1.2m in pre-seed funding led by OSS Capital, with participation from Caffeinated Capital and prominent angel investors.

At MergeStat, our mission is to enable operational analytics for software engineering teams via powerful, open-source tooling and systems.

We believe there is tremendous value in connecting SQL with the systems used to build and ship software. Engineering organizations are often “black boxes” driven by intuition and gut-feel decisions. By enabling access to the processes and artifacts of software engineering as data, MergeStat helps organizations become data-driven in the ways most relevant to their needs.

Our PostgreSQL approach is based on applying SQL to the data sources involved in building and shipping software. SQL not only grants the ability to ask adhoc questions, but also brings compatibility with many BI, visualization, and data platforms.

We’re working with a variety of organizations using MergeStat to answer questions about the software-development-lifecycle, in the tools they prefer. To date, we’ve worked with teams using our SQL approach to answer questions around:

  • Audit and compliance
  • Code and dependency version sprawl
  • Engineering metrics and transparency (DORA metrics)
  • Code pattern monitoring and reporting
  • Developer on-boarding
  • Vulnerability reporting
  • …and more

“MergeStat is bringing the lingua franca of data to code: applying SQL to the modern SDLC will unlock tremendous value for all software developers. We are greatly honored to partner with Patrick DeVivo on this important mission!”

– Joseph Jacks, Founder and General Partner at OSS Capital

Over the past year, MergeStat has grown into an amazing team with a growing open-source presence. We’re very excited for our next chapter, as we begin working with more organizations looking to answer questions about how their engineering teams operate.


Patrick DeVivo

If you’ve spent any time around code, you’ll know that // TODO comments are a fact of life. They're a common pattern, regardless of how you feel about them! For instance, the Kubernetes codebase has over 2,000, and Linux has over 3,000.

In the MergeStat codebase, we use them as a low-effort way to track small, technical debt items. Sometimes, a future fix or refactor might not warrant a ticket (it’s too small and too specific to a piece of implementation detail), but it’s still worth making a note. TODO comments are a good fit for this because they are:

  • Low effort - very easy to add and remove, just leave a comment in the code
  • Safe from context switching - no need to switch into a ticketing system, stay in the editor and near the relevant code
  • Tracked in version control - so you have a history and audit trail

We recently connected with Ivan Smirnov, CTO of Elude on this topic, and were excited to learn about his enthusiasm for tracking TODO comments 🙂. He shared that during his time at Google, there was an internal system which aggregated TODOs across codebases, as a way of surfacing parts of code worth returning to. He missed having a similar solution in his role at Elude.

Luckily, we were able to help with a MergeStat + Grafana based solution! Elude operates a self-hosted instance of MergeStat (using our Kubernetes helm chart), and connects to its PostgreSQL database with a Grafana instance. We collaborated on putting together a starting “TODO Tracker” dashboard, which is available in our examples as a Grafana export:

Screenshot of Grafana board tracking TODOs

"Elude currently uses TODO comments as a low friction mechanism to track technical debt. MergeStat is the missing link that allows us to turn these comments into trackable, actionable tasks." – Ivan Smirnov, Elude CTO

The SQL involved looks something like this:

REPLACE(repos.repo, '', '') AS repo,
repos.repo || '/blob/main/' || git_blame.path || '#L' || git_blame.line_no AS url -- generate a link to the line in GitHub
FROM git_blame
INNER JOIN repos ON = git_blame.repo_id
WHERE git_blame.line LIKE '%TODO%'
ORDER BY git_blame.author_when ASC

and should be fairly easy to customize to different needs:

  • Only apply to certain repos
  • Filter out certain file paths by pattern
  • Look for FIXME and BUG comments as well
  • Parse out "assignees" (i.e. TODO(patrickdevivo))
  • etc...

If you're interested in taking a look at your own // TODO comments, go ahead and try out a local instance of MergeStat!

Join our Slack

Our community Slack is a great place to find help and ask questions. We're always happy to chat about MergeStat there 🎉!

Patrick DeVivo

Today we're excited to announce the availability of in-app SQL querying in the latest versions of MergeStat! 🎉

Illustration of the Query UI

One of the more significant pain points noted by our early adopters has been the need for external tools to access the data MergeStat provides, even for simple, one-off queries. Many users (ourselves included) lean on existing data products such as Grafana, Metabase and Superset to consume MergeStat data (in dashboards, alerts and reports).

With the addition of our query interface (in the Queries tab of our management console), MergeStat users can now execute SQL directly in our app.

Query results can be copied or downloaded as CSV or JSON, for quick use in other tools and contexts. We believe this is a significant step-forward in our app's functionality, and will continue to invest in this area of our management console. Keep a lookout for additional features, including:

  • Saved and examples queries
  • Inline schema documentation and editor auto-completions
  • Basic charting and data visualization
  • Query execution history

It's important to note that as always, the MergeStat PostgreSQL database can be connected directly to by BI tools, desktop SQL clients, or SQL drivers.

Check out the latest MergeStat release to get started!

Join our Slack

Our community Slack is a great place to find help and ask questions. We're always happy to chat about MergeStat there 🎉!

Patrick DeVivo

Today we're very happy to announce a new flavor of MergeStat! If you've been following our progress over the last year, you'll know that our mission is to enable operational analytics for software engineering teams. You'll also know that our approach has been heavily based on SQLite (and its virtual table mechanism) to bring data from git repositories (source code, metadata, the GitHub API, etc) into a SQL context for flexible querying and analytics.

MergeStat Management Console Illustration

This has enabled initial use cases in many interesting domains related to the software-development-lifecycle, from audit and compliance, to code querying, to release time metrics, to technical debt surfacing, and much, much more.

For the past several months, we’ve been working on an approach in service of this mission that uses a different mechanism for enabling that SQL context. That mechanism is based on syncing data into a Postgres database for downstream querying, vs using a local SQLite based approach.

This grants us two significant advantages:

  1. More compatibility with the ecosystem of data (and BI/visualization) tools
  2. Much faster query time for data spread across multiple API sources

More Compatibility

We love SQLite, but the fact of the matter is that it’s much easier to integrate a data visualization or BI product (open-source or not) with a Postgres server (or Postgres compatible server) than with a SQLite database file. A central part of our mission has always been not just enabling SQL, but allowing our SQL solution to play well with the wide array of existing tools that companies, teams and individuals are already using.

We want anyone to be able to query MergeStat data from tools like Metabase, Grafana, Tableau, Superset, etc. Postgres compatibility takes us a step in that direction, and we’re already working with early users integrating their SDLC data with these types of tools, via MergeStat.

Faster Queries

Our SQLite virtual-table based approach is an effective way to query a local data source (such as a git repository on disk). However, it begins to fall short when it comes to querying over large sets of data spread across many pages in a web API (or multiple APIs). What ends up happening is queries spend much more time waiting for HTTP requests to finish than actually executing SQL. This means at query time, the user is forced to wait for data collection to occur. Depending on the scope of the data involved, this can potentially take a very long time. With our new approach, data is synced and maintained in a background process, so that data collection (from potentially slow sources) occurs out-of-band from user queries.

What’s Next?

Our SQLite approach isn’t going anywhere. We still intend to invest in that and continue building it into a valuable open-source CLI that anyone can use. It’s been renamed MergeStat Lite and now lives at mergestat/mergestat-lite.

Our new approach now lives at mergestat/mergestat. Our documentation has been updated to reflect these changes, and we couldn’t be more excited to share more in the coming days and weeks.

We have some great features planned, and will be spending more time showcasing various use-cases from our growing community.

Join our Slack

As usual, our community Slack is a great place to find help and ask questions. We're always happy to chat about MergeStat there 🎉!