bkamapantula / idlebrain-reviews

scraper for movie reviews from idlebrain.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Idlebrain reviews

This is a repository for scripts to parse idlebrain.com reviews.

We explore the distribution of reviews across the years.

Install dependencies

pip install -e .  # picks up packages from requirements.txt and installs

Fetch links from archive page

Idlebrain Archive page lists movies.

Step 1 - Create database

Create reviews.db file. Open it with SqliteBrowser and create movies table with the following schema:

CREATE TABLE `movies` ( `id` INTEGER, `name` TEXT, `url` TEXT, `release_date` TEXT, `rating` TEXT )

We fetch the archive list, save the links in a sqlite database named reviews.db (movies table). For this, run parse_reviews() in parse.py.

This updates movies table with movie entries.

Fetch reviews for all movies

Step 1 - Create data directory

Create data directory in the root directory.

Step 2 - Scrape movie reviews

For this, run fetch_data_from_IB() in parse.py. This creates movie_name.html in data/ directory.

About

scraper for movie reviews from idlebrain.com

License:MIT License


Languages

Language:Python 100.0%