matheus-1618 / Github-Colaborations-Network

Github colabs (commits and PR's) in a graph network

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Github-Colaborations-Network

Responsible Team

This is a social netowrk analysis of Github collaborations, including commits and Pull requests between January and February of 2015. The goal of this research is analyze the following hypotesis:

The greater the chance of belonging to a community of contributors, the greater the user focus. The definition of focus is the number of contributions involving repositories for the edges over the number of total contributions.

Two definitions are important here:

  • Belonging to the Community : A collaboration community exists in the network if the neighbors of a given user also have an edge between them, indicating that they belong to the same collaboration communities in common or famous repositories, generating a cluster of collaborations between these users. Collaborations above the average of total collaborations were used for this definition.
  • Focus: The extent to which he concentrates his contribution time on specific projects, on minimally related or similar topics. Defined by the formula above:

Number of developer collaborations considered for edge formation divided per total collaborations in the data, thoughtful by the Number of developer collaborations per the max Number of collaborations of an individual:*

$$focus = \frac{edgeContributions_i}{totalContributions_i}\cdot\frac{edgeContributions_i}{maxEdgeContribution}$$

Running the project

  1. Getting the Data The frist step is getting the data in the GH Archive hub. For this, we used a Batch File (for Windows) to get the data in the mentioned period. For get the data go to the data/ folder:
cd data/
setup.bat

Wait for the download.

  1. Transforming it in csv files After getting the data, we need to ingest it in CSV files. Open the Concat_data notebook and execute the cells with the intent of this convertion, getting only data related to PR's and Commits.

  2. Execute the analysis After having the data locally, you'll need to use a Docker container to be able to reproduce the network analysis.

    Clone this repository, and follow this tutorial.

After this steps, you can execute the analysis locally.

Another context

To confront our hypotesis in a different context to see it's implications we've merged Reddit data (similar concept of communities[Subreddits] and Colabs[Comments]). This analysis can be viewd at Reddit/ folder.

@Insper,2023.

About

Github colabs (commits and PR's) in a graph network


Languages

Language:Jupyter Notebook 100.0%Language:Batchfile 0.0%