michaelfaerber / paper-github-analysis

Data and code for "Analyzing the GitHub Repositories of Research Papers"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analyzing the GitHub Repositories of Research Papers

Methodology & Data Set

In this repository, we provide the source code and the data base used for an analysis of all GitHub code repositories linked in scientific papers. The data base was retrieved by querying the Microsoft Academic Graph, which is licensed under ODC-By. We analyzed the repositories and their associated papers with respect to various dimensions.

Results

We observe that the number of stars and forks, respectively, over all repositories follows a power-law distribution. In the majority of cases, only one person from the authors is contributing to the repository. The GitHub manuals are mostly kept rather short with few sentences. The source code is mostly provided in Python. The papers containing the repository URLs as well as the papers' authors are typically from the AI field.

More Information & Reference

More information about our work can be found in the following paper:

Michael Färber: "Analyzing the GitHub Repositories of Research Papers". Proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL'20), Xi'an, China, 2020.

Please use this paper for citing our work.

Acknowledgements

We thank Erhan Metin for his contributions to this work.

About

Data and code for "Analyzing the GitHub Repositories of Research Papers"


Languages

Language:Python 100.0%