alyssafrazee / github_analysis

code for analysis of github repo ownership and gender

Home Page:http://alyssafrazee.com/gender-and-github-code.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gender of GitHub repo owners


This repository contains the code and data used to analyze the gender breakdown of owners of public GitHub repositories. I wrote a blog post about what I found.

To reproduce the analysis, run scripts in the following order:

  • get_github_info_byday.py: uses the GitHub API to scrape repository data. (nb: This will take something like 60 hours to run).
  • merge_files.sh: puts all scraped data together in a big text file
  • make_database.R: dumps the scraped data into a SQLite database
  • analyze_data.py: processes the data
  • bargraph.js: JavaScript/D3 code used to make the graphic showing the results. Alex Wilson made major contributions to this code.

data

The data I scraped in get_github_info_byday.py and processed with merge_files.sh and make_database.R is available in a .db file here. I removed all repo owner last names.

dependencies

Python libraries: PyGithub, Unidecode, Pandas, SexMachine, Matplotlib. Make sure these are installed before running scripts. See requirements.txt for a more detailed specification of Python dependencies, including versions.

R packages: devtools, proto, DBI, chron, RSQLite, and RSQLITE.extfuns. All can be installed from CRAN in R using the install.packages function.

About

code for analysis of github repo ownership and gender

http://alyssafrazee.com/gender-and-github-code.html


Languages

Language:JavaScript 60.7%Language:Python 29.0%Language:R 8.4%Language:Shell 1.9%