- scrape.py: Web crawler written using Scrapy
- data_< i >.json: relevant metric pulled from the crawler for < i > no. of links
-
Install Rust for cryptography package required by Scrapy
https://rustup.rs
-
Installing cryptography
pip3 install cryptography
May be required to do this: https://stackoverflow.com/questions/66035003/installing-cryptography-on-an-apple-silicon-m1-mac
-
Install nltk, Beautifulsoup
pip3 install nltk BeautifulSoup4
-
Update pip if needed
python3 -m pip install --upgrade pip
-
Install Scrapy
pip3 install Scrapy
-
Install SQLite
brew install sqlite
-
Install matplotlib,
pip3 install matplotlib
- Set max links variable in
scrape.py
- Run
python3 -m scrapy runspider scrape.py
- Note relevant metric into data_.json files. To plot use
python3 plot.py