nabaskes / Pitchforker

Web crawler to mine album review scores and metadata from pitchfork.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

To install, first clone, then virtualenv -p python3 . This requires at least python3.6.1

Then, do . bin/activate pip install -r requirements.txt

To create the database: sqlite3 albums.db To create tables: python3 models.py

To scrape first ten pages of high-scoring albums pages: python3 scraper.py

To scrape some other page python3 $URL

To scrape deeper/less deep on high-scoring albums python3 100

If you want to crawl deeper from a single starting point, change the global variable RECURSION_DEPTH at the start of scraper.py

If you want to use more or fewer Python threads, change MAX_WORKERS at the start of scraper.py

About

Web crawler to mine album review scores and metadata from pitchfork.com

License:MIT License


Languages

Language:Python 100.0%