mohanmca / nbestimate

Estimate of Public Jupyter Notebooks on GitHub

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Estimate of Public Jupyter Notebooks on GitHub

A brief history of this project:

  • Late-2014 to mid-2016: I wrote a script that scrapes the GitHub web search UI for the count, appends to a CSV, executes a notebook, and stores the results in a gist at https://gist.github.com/parente/facb555dfbae28e817e0. I scheduled the script to run daily.
  • Mid-2106 to Late-2016: The GitHub web search UI started requiring authentication to see global search results. I stopped collecting data.
  • Late-2016 to early-2019: I rewrote the process to include a human-in-the-loop who entered the hit count after viewing the search results page. I moved the CSV, notebook, and scripts to this repo, and sporadically ran the script.
  • Early-2019: I found out that the GitHub search API now supports global search. I automated the entire collection process again and set it to run on TravisCI on a daily schedule.

About

Estimate of Public Jupyter Notebooks on GitHub

License:MIT License


Languages

Language:Jupyter Notebook 98.9%Language:Python 1.1%