Skip to main content

GitHub API

You can use mergestat to query the GitHub API (v4). Constraints in your SQL query are pushed to the GitHub API as much as possible. For instance, if your query includes an ORDER BY clause and if items can be ordered in the GitHub API response (on the specified column), your query can avoid doing a full table scan and rely on the ordering returned by the API.

Authenticating

You must provide an authentication token in order to use the GitHub API tables. You can create a personal access token following these instructions. mergestat will look for a GITHUB_TOKEN environment variable when executing, to use for authentication. This is also true if running as a runtime loadable extension.

Rate Limiting

All API requests to GitHub are rate limited. The following tables make use of the GitHub GraphQL API (v4), which rate limits additionally based on the "complexity" of GraphQL queries. Generally speaking, the more fields/relations in your GraphQL query, the higher the "cost" of a single API request, and the faster you may reach a rate limit. Depending on your SQL query, it's hard to know ahead of time what a good client-side rate limit is. By default, each of the tables below will fetch 100 items per page and permit 2 API requests per second. You can override both of these parameters by setting the following environment variables:

  1. GITHUB_PER_PAGE - expects an integer between 1 and 100, sets how many items are fetched per-page in API calls that paginate results.
  2. GITHUB_RATE_LIMIT - expressed in the form (number of requests) / (number of seconds) (i.e. 1/3 means at most 1 request per 3 seconds)

If you encounter a rate limit error that looks like You have exceeded a secondary rate limit, consider setting the GITHUB_PER_PAGE value to a lower number. If you have a large number of items to scan in your query, it may take longer, but you should avoid hitting a rate limit error.

github_stargazers

Table-valued-function that returns a list of users who have starred a repository.

ColumnType
loginTEXT
emailTEXT
nameTEXT
bioTEXT
companyTEXT
avatar_urlTEXT
created_atDATETIME
updated_atDATETIME
twitterTEXT
websiteTEXT
locationTEXT
starred_atDATETIME

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT * FROM github_stargazers('mergestat', 'mergestat');
SELECT * FROM github_stargazers('mergestat/mergestat'); -- both are equivalent

github_starred_repos

Table-valued-function that returns a list of repositories a user has starred.

ColumnType
nameTEXT
urlTEXT
descriptionTEXT
created_atDATETIME
pushed_atDATETIME
updated_atDATETIME
stargazer_countINT
name_with_ownerTEXT
starred_atDATETIME

Params:

  1. login - the login of a GitHub user
SELECT * FROM github_starred_repos('patrickdevivo')

github_stargazer_count

Scalar function that returns the number of stars a GitHub repository has.

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT github_stargazer_count('mergestat', 'mergestat');
SELECT github_stargazer_count('mergestat/mergestat'); -- both are equivalent

github_user

Scalar function that returns information about a GitHub user as JSON.

Params:

  1. login - the user's login
SELECT github_user('patrickdevivo');

github_user_repos and github_org_repos

Table-valued function that returns all the repositories belonging to a user or an organization.

ColumnType
created_atDATETIME
database_idINT
default_branch_ref_nameTEXT
default_branch_ref_prefixTEXT
descriptionTEXT
disk_usageINT
fork_countINT
homepage_urlTEXT
is_archivedBOOLEAN
is_disabledBOOLEAN
is_forkBOOLEAN
is_mirrorBOOLEAN
is_privateBOOLEAN
issue_countINT
latest_release_authorTEXT
latest_release_created_atDATETIME
latest_release_nameTEXT
latest_release_published_atDATETIME
license_keyTEXT
license_nameTEXT
nameTEXT
open_graph_image_urlTEXT
primary_languageTEXT
pull_request_countINT
pushed_atDATETIME
release_countINT
stargazer_countINT
topicsJSON
updated_atDATETIME
watcher_countINT

Params:

  1. login - the login of a GitHub user or organization
  2. affiliations - a comma-separated list of repository affiliations. Can be: OWNER, COLLABORATOR or ORGANIZATION_MEMBER
SELECT * FROM github_user_repos('patrickdevivo')
SELECT * FROM github_org_repos('mergestat')
SELECT * FROM github_user_repos('patrickdevivo', 'OWNER')
SELECT * FROM github_org_repos('mergestat', 'OWNER,COLLABORATOR')

github_repo_issues

Table-valued-function that returns all the issues of a GitHub repository.

ColumnType
ownerTEXT
reponameTEXT
author_loginTEXT
bodyTEXT
closedBOOLEAN
closed_atDATETIME
comment_countINT
created_atDATETIME
created_via_emailBOOLEAN
database_idTEXT
editor_loginTEXT
includes_created_editBOOLEAN
label_countINT
last_edited_atDATETIME
lockedBOOLEAN
milestone_countINT
numberINT
participant_countINT
published_atDATETIME
reaction_countINT
stateTEXT
titleTEXT
updated_atDATETIME
urlTEXT

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT * FROM github_repo_issues('mergestat/mergestat');
SELECT * FROM github_repo_issues('mergestat', 'mergestat'); -- both are equivalent

github_repo_prs

Table-valued-function that returns all the pull requests of a GitHub repository.

ColumnType
additionsINT
author_loginTEXT
author_associationTEXT
author_avatar_urlTEXT
author_nameTEXT
base_ref_oidTEXT
base_ref_nameTEXT
base_repository_nameTEXT
bodyTEXT
changed_filesINT
closedBOOLEAN
closed_atDATETIME
comment_countINT
commit_countINT
created_atTEXT
created_via_emailBOOLEAN
database_idINT
deletionsINT
editor_loginTEXT
head_ref_nameTEXT
head_ref_oidTEXT
head_repository_nameTEXT
is_draftINT
label_countINT
last_edited_atDATETIME
lockedBOOLEAN
maintainer_can_modifyBOOLEAN
mergeableTEXT
mergedBOOLEAN
merged_atDATETIME
merged_byTEXT
numberINT
participant_countINT
published_atDATETIME
review_decisionTEXT
stateTEXT
titleTEXT
updated_atDATETIME
urlTEXT

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT * FROM github_repo_prs('mergestat/mergestat');
SELECT * FROM github_repo_prs('mergestat', 'mergestat'); -- both are equivalent

github_repo_branch_protections

Table-valued-function that returns all the branch protection rules set on a GitHub repository (requires GitHub access token to have admin privileges).

ColumnType
allow_deletionsBOOLEAN
allows_force_pushesBOOLEAN
creator_loginTEXT
database_idINT
dismisses_stale_reviewsBOOLEAN
is_admin_enforcedBOOLEAN
patternTEXT
required_approving_review_countINT
required_status_check_contextsBOOLEAN
requires_approving_reviewsDATETIME
requires_code_owners_reviewsBOOLEAN
requires_commit_signatureBOOLEAN
requires_conversation_resolutionBOOLEAN
requires_linear_historyBOOLEAN
requires_status_checksBOOLEAN
requires_strict_status_checksBOOLEAN
restricts_pushesBOOLEAN
restricts_review_dismissalBOOLEAN

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT * FROM github_repo_branch_protections('mergestat/mergestat');
SELECT * FROM github_repo_branch_protections('mergestat', 'mergestat');
SELECT * FROM github_branch_protections('mergestat/mergestat');
SELECT * FROM github_branch_protections('mergestat', 'mergestat'); -- all are equivalent

github_repo_file_content

Scalar function that returns the contents of a file in a GitHub repository

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
  3. expression - either a simple file path (README.md) or a rev-parse suitable expression that includes a path (HEAD:README.md or <some-sha>:README.md)
SELECT github_stargazer_count('mergestat', 'mergestat', 'README.md');
SELECT github_stargazer_count('mergestat/mergestat', 'README.md'); -- both are equivalent

github_repo_issue_comments

Table valued function that returns comments on a given issue.

ColumnType
author_loginTEXT
author_urlTEXT
bodyTEXT
created_atTEXT
database_idINT
idTEXT
updated_atTEXT
urlTEXT
issue_idTEXT
issue_numberINT

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
  3. issue_number - the issue number
SELECT github_repo_issue_comments('mergestat/mergestat', 100);
SELECT github_issue_comments('mergestat/mergestat', 100);

github_repo_pr_comments

Table valued function that returns all comments on a given pull request.

ColumnType
author_loginTEXT
author_urlTEXT
bodyTEXT
created_atTEXT
database_idINT
idTEXT
updated_atTEXT
urlTEXT
pr_idTEXT
pr_numberINT

Params:

  1. fullNameOrOwner - either the full repo name mergestat/mergestat or just the owner mergestat (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
  3. number - the pull request number to pull comments from
SELECT github_repo_pr_comments('mergestat/mergestat', 200);
SELECT github_repo_pr_comments('mergestat', 'mergestat', 200);
SELECT github_pr_comments('mergestat/mergestat', 200);
SELECT github_pr_comments('mergestat',' mergestat', 200);