gbaf / git-rater

A machine learning model that tries to predict the quality of your github profile

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

github-rater

A machine learning model that tries to predict the quality of your github profile
Medium article: coming soon
Dataset: https://www.kaggle.com/conorsully1/github-ratings

File Description
save.py Collection of functions used to scrape and save GitHub profile html
capture.py Collection of functions used to extract specific elements from html and create features
collect_data.ipynb Notebook used to collect GitHub profile data for a given list of users
feature_engineering.ipynb Create model features from saved profile data
decision_tree.ipynb Build a decison tree to predict the ratings of github profiles
random_forest.ipynb Build a random forest to predict the ratings of github profiles

How to

Web scrapper

Note: this code is outdate and will need to be updated to run on the new GitHub layout
First download the html of a users profile:

import save as sv #save html
import capture as cp #obtain features

user = "conorosully"
path = "../data/test/"

#save html to folder
sv.save_all(user, path)

Then the features can be extracted:

#obtain features
counts = cp.get_counts(user, path)
followers = cp.get_friends(user, path, "followers")
following = cp.get_friends(user, path, "following")
repos = cp.get_repos(user, path)
cont = cp.get_contributions(user, path);

About

A machine learning model that tries to predict the quality of your github profile

License:MIT License


Languages

Language:Jupyter Notebook 99.4%Language:Python 0.6%