igrigorik / gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

Home Page:https://www.gharchive.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Jump in observations from February 2019

grantmcdermott opened this issue · comments

In trying to track some general GitHub trends over time, I've noticed a few discrete jumps in the GH Archive data. One such case is a sustained (15-30%) jump in the number of data observations beginning February 9th, 2019. This jump carries through to event data too: pushes, commits, etc.

Here's a relevant figure (source query at the bottom of this post).
ghnobs

Any idea as to what's going on here?

Thanks in advance and (especially) for this excellent initiative.


BigQuery call for the figure's underlying data.

(SELECT COUNT(id) AS nobs, 
DATE(created_at) AS date
FROM `githubarchive.month.201901` 
GROUP BY date)
UNION ALL
(SELECT COUNT(id) AS nobs,
DATE(created_at) AS date
FROM `githubarchive.month.201902` 
GROUP BY date)
UNION ALL
(SELECT COUNT(id) AS nobs,
DATE(created_at) AS date
FROM `githubarchive.month.201903` 
GROUP BY date)
ORDER BY date

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.