Skip to main content

2 posts tagged with "sqlite"

View All Tags

Patrick DeVivo

Today we're very happy to announce a new flavor of MergeStat! If you've been following our progress over the last year, you'll know that our mission is to enable operational analytics for software engineering teams. You'll also know that our approach has been heavily based on SQLite (and its virtual table mechanism) to bring data from git repositories (source code, metadata, the GitHub API, etc) into a SQL context for flexible querying and analytics.

MergeStat Management Console Illustration

This has enabled initial use cases in many interesting domains related to the software-development-lifecycle, from audit and compliance, to code querying, to release time metrics, to technical debt surfacing, and much, much more.

For the past several months, we’ve been working on an approach in service of this mission that uses a different mechanism for enabling that SQL context. That mechanism is based on syncing data into a Postgres database for downstream querying, vs using a local SQLite based approach.

This grants us two significant advantages:

  1. More compatibility with the ecosystem of data (and BI/visualization) tools
  2. Much faster query time for data spread across multiple API sources

More Compatibility

We love SQLite, but the fact of the matter is that it’s much easier to integrate a data visualization or BI product (open-source or not) with a Postgres server (or Postgres compatible server) than with a SQLite database file. A central part of our mission has always been not just enabling SQL, but allowing our SQL solution to play well with the wide array of existing tools that companies, teams and individuals are already using.

We want anyone to be able to query MergeStat data from tools like Metabase, Grafana, Tableau, Superset, etc. Postgres compatibility takes us a step in that direction, and we’re already working with early users integrating their SDLC data with these types of tools, via MergeStat.

Faster Queries

Our SQLite virtual-table based approach is an effective way to query a local data source (such as a git repository on disk). However, it begins to fall short when it comes to querying over large sets of data spread across many pages in a web API (or multiple APIs). What ends up happening is queries spend much more time waiting for HTTP requests to finish than actually executing SQL. This means at query time, the user is forced to wait for data collection to occur. Depending on the scope of the data involved, this can potentially take a very long time. With our new approach, data is synced and maintained in a background process, so that data collection (from potentially slow sources) occurs out-of-band from user queries.

What’s Next?

Our SQLite approach isn’t going anywhere. We still intend to invest in that and continue building it into a valuable open-source CLI that anyone can use. It’s been renamed MergeStat Lite and now lives at mergestat/mergestat-lite.

Our new approach now lives at mergestat/mergestat. Our documentation has been updated to reflect these changes, and we couldn’t be more excited to share more in the coming days and weeks.

We have some great features planned, and will be spending more time showcasing various use-cases from our growing community.

Join our Slack

As usual, our community Slack is a great place to find help and ask questions. We're always happy to chat about MergeStat there 🎉!

Patrick DeVivo

SQLite 3.38.0 (released 2022-02-22) added support for built-in JSON functions (no longer a compile-time option) and two new JSON operators, -> and ->>, similar to ones available in MySQL and PostgreSQL.

MergeStat is a SQL interface to data in git repositories, based on SQLite. As such, we can now take advantage of these new operators in our queries, wherever JSON data is returned.

note

To install the MergeStat CLI via homebrew, run:

brew tap mergestat/mergestat
brew install mergestat

Or see other options

package.json

An easy source of JSON to start with is the package.json file present in most JavaScript/TypeScript codebases. Let's take a look at the one in the popular tailwindlabs/tailwindcss repo.

SELECT contents->>'name', contents->>'license' FROM files WHERE path = 'package.json'
+-------------------+----------------------+
| CONTENTS->>'NAME' | CONTENTS->>'LICENSE' |
+-------------------+----------------------+
| tailwindcss | MIT |
+-------------------+----------------------+

Here we use contents->>'name' and contents->>'license' to pull out the name and license fields from the package.json JSON file.

Let's take this up a level, and look at at the name and license fields for all repos in an org where a package.json file is present. Sticking with tailwindlabs (and now adding a GITHUB_TOKEN when we execute the following):

-- CTE (https://www.sqlite.org/lang_with.html)
-- to select all the package.json files from repos in the tailwindlabs GitHub org
WITH package_json_files AS (
SELECT
name,
github_repo_file_content('tailwindlabs', repo.name, 'package.json') AS package_json
FROM github_org_repos('tailwindlabs') repo
)
SELECT
name as repo_name,
package_json->>'name',
package_json->>'license'
FROM package_json_files
WHERE package_json IS NOT NULL
note

This query may take some time to run, as it makes a fair number of GitHub API requests.

+-----------------------------+-----------------------------+--------------------------+
| REPO_NAME | PACKAGE_JSON->>'NAME' | PACKAGE_JSON->>'LICENSE' |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss | tailwindcss | MIT |
+-----------------------------+-----------------------------+--------------------------+
| webpack-starter | NULL | NULL |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss.com | NULL | NULL |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-plugin-examples | NULL | NULL |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-intellisense | root | NULL |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-playground | NULL | NULL |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-custom-forms | @tailwindcss/custom-forms | MIT |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-typography | @tailwindcss/typography | MIT |
+-----------------------------+-----------------------------+--------------------------+
| heroicons | heroicons | MIT |
+-----------------------------+-----------------------------+--------------------------+
| tailwindui-vue | @tailwindui/vue | MIT |
+-----------------------------+-----------------------------+--------------------------+
| blog.tailwindcss.com | tailwind-blog | NULL |
+-----------------------------+-----------------------------+--------------------------+
| tailwindui-react | @tailwindui/react | MIT |
+-----------------------------+-----------------------------+--------------------------+
| heroicons.com | NULL | NULL |
+-----------------------------+-----------------------------+--------------------------+
| play.tailwindcss.com | play.tailwindcss.com | NULL |
+-----------------------------+-----------------------------+--------------------------+
| headlessui | headlessui | MIT |
+-----------------------------+-----------------------------+--------------------------+
| tailwind-play-api | NULL | NULL |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-aspect-ratio | @tailwindcss/aspect-ratio | MIT |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-forms | @tailwindcss/forms | MIT |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-line-clamp | @tailwindcss/line-clamp | MIT |
+-----------------------------+-----------------------------+--------------------------+
| tailwindcss-jit | @tailwindcss/jit | MIT |
+-----------------------------+-----------------------------+--------------------------+
| prettier-plugin-tailwindcss | prettier-plugin-tailwindcss | NULL |
+-----------------------------+-----------------------------+--------------------------+

We see that most repos with a package.json file also have a name, while some do not (NULL). All the declared licences appear to be MIT.

Dependencies

Finally, let's take a look at all the dependencies declared in package.json files across (JavaScript/TypeScript) codebases in an org. This time, let's try on the freeCodeCamp GitHub org.

WITH package_json_files AS (
SELECT
name,
github_repo_file_content('freeCodeCamp', repo.name, 'package.json') AS package_json
FROM github_org_repos('freeCodeCamp') repo
WHERE (primary_language = 'TypeScript' OR primary_language = 'JavaScript')
)
SELECT
count(*),
json_group_array(name) AS repo_names,
dep.key
FROM package_json_files, json_each(package_json->'dependencies') dep
WHERE package_json IS NOT NULL
GROUP BY dep.key
ORDER BY count(*) DESC

This query will show the most frequently used dependencies within a GitHub org (as extracted from the dependencies JSON value of package.json files). Here are the top 10 results from the above query, on freeCodeCamp:

  1. dotenv (35)
  2. express (31)
  3. body-parser (26)
  4. react-dom (16)
  5. react (16)
  6. lodash (16)
  7. cors (14)
  8. mongoose (12)
  9. chai (12)
  10. passport (11)
Join our Slack

We've recently launched a community Slack - feel free to stop in if you have questions or anything to share 🎉.