github-crawler

Extract GitHub repositories metadata and README content.

STEPS:

```sh
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# when finish using
deactivate 
```

```sh
conda env create -f conda.yaml
conda activate crawler
# when finish using
conda deactivate
```

Update the .env file with the correct params
```
cp .env.example .env
code .env
```
Run the following scripts:

i. python crawl_repos.py <topic-name> <stars-size> to crawl all the repos with the topic and stars greater or equal . If omitted will consider 0+ stars.

ii. python get_contributors.py to crawl all the user who contributed the crawled repo from step 3.i

iii. python get_stargazers.py to crawl all the users who starred the crawled repo from step 3.i

About

Extract GitHub repositories metadata and README content.

Apache License 2.0

Language:Python 97.5%Language:Dockerfile 2.0%Language:Shell 0.5%