IBM / github-crawler

Extract GitHub repositories metadata and README content.

Home Page:https://ibm.github.io/github-crawler/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

github-crawler

Extract GitHub repositories metadata and README content.

STEPS:

  1. environment SETUP and package installation

Virtual env

```sh
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# when finish using
deactivate 
```

Conda

```sh
conda env create -f conda.yaml
conda activate crawler
# when finish using
conda deactivate
```
  1. Update the .env file with the correct params

    cp .env.example .env
    code .env
  2. Run the following scripts:

    i. python crawl_repos.py <topic-name> <stars-size> to crawl all the repos with the topic and stars greater or equal . If omitted will consider 0+ stars.

    ii. python get_contributors.py to crawl all the user who contributed the crawled repo from step 3.i

    iii. python get_stargazers.py to crawl all the users who starred the crawled repo from step 3.i

About

Extract GitHub repositories metadata and README content.

https://ibm.github.io/github-crawler/

License:Apache License 2.0


Languages

Language:Python 97.5%Language:Dockerfile 2.0%Language:Shell 0.5%