It's a new year, and a post listing the top 30 committers to the Blender project in 2021 was on the front-page of HN this morning.
Below is the list of top committers to the Blender project in 2021. The amount of commits obviously doesnβt mean much, but itβs a neutral metric to put limelight on people who made Blender possible last year.
As the post says, counting commits doesn't mean much alone, but it's a decent proxy for relative activity within a codebase. As a maintainer of the MergeStat project, a SQL interface to data in git, I wanted to show how to make a similar list for any repo, and take it a step further with some additional metrics.
Take a look at our CLI guide to run the following queries yourself, on any repo.
Top 30 Contributors by Commit Countβ
Replicating the list in the original blog post can be done with the following query:
SELECT
count(*), author_name, author_email
FROM commits
WHERE parents < 2 -- ignore merge commits
AND strftime('%Y', author_when) = '2021' -- commits only authored in 2021
GROUP BY author_name, author_email
ORDER BY count(*) DESC
LIMIT 30
This yields a list of names (ordered and with counts) very similar to what's in the original post π.
Adding Commit Statsβ
The stats
table allows us to also look at the number of files modified in each commit as well as LOC added or removed.
We can join the stats
table with commits
to measure:
- Total lines added by a contributor
- Total lines removed by a contributor
- Distinct files modified by a contributor
In addition to just counting the number of commits.
SELECT
count(DISTINCT hash) AS commits,
sum(additions),
sum(deletions),
count(DISTINCT file_path),
author_name, author_email
FROM commits, stats('', commits.hash)
WHERE parents < 2 -- ignore merge commits
AND strftime('%Y', author_when) = '2021' -- commits only authored in 2021
GROUP BY author_name, author_email
ORDER BY count(*) DESC
LIMIT 30
Similar to counting commits, LOC added or removed doesn't mean much indepedently. Files modified (distinct files changed by an author over the year) is a bit more interesting - depending on the size/nature of a codebase, it could be a measure of how "deeply" involved a particular contributor is - i.e. someone who contributes to all aspects of the project, not just a subset of it.
Making Some Graphsβ
Finally, by adding the --format csv
flag to the mergestat
command when executing the above queries, we can copy the output into a spreadsheet to make some charts.
These queries should be portable to your own repositories (or any other). Try them out in our web app or with our CLI.
We've recently launched a community Slack - feel free to stop in if you have questions or anything to share π.