Analyzing the GitHub Repositories of Research Papers
Methodology & Data Set
In this repository, we provide the source code and the data base used for an analysis of all GitHub code repositories linked in scientific papers. The data base was retrieved by querying the Microsoft Academic Graph, which is licensed under ODC-By. We analyzed the repositories and their associated papers with respect to various dimensions.
Results
We observe that the number of stars and forks, respectively, over all repositories follows a power-law distribution. In the majority of cases, only one person from the authors is contributing to the repository. The GitHub manuals are mostly kept rather short with few sentences. The source code is mostly provided in Python. The papers containing the repository URLs as well as the papers' authors are typically from the AI field.
More Information & Reference
More information about our work can be found in the following paper:
Michael Färber: "Analyzing the GitHub Repositories of Research Papers". Proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL'20), Xi'an, China, 2020.
Please use this paper for citing our work.
Acknowledgements
We thank Erhan Metin for his contributions to this work.