ed12rivera / Movies-ETL

Create an ETL pipeline to clean large JSON and CSV files and load into a SQL database for a local hackathon.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Movies-ETL

In this project we are creating a clean data set to be used in a hackathon where the goal is to predict which low budget movies will be successful.

We extracted data from a scrape of wikipedia which included all movies released since 1990 and ratings data from the movielens website. We then cleaned the data we extracted to make it easier to work with, and finally we uploaded the data onto a database in PostgreSQL.

About

Create an ETL pipeline to clean large JSON and CSV files and load into a SQL database for a local hackathon.


Languages

Language:Jupyter Notebook 98.8%Language:Python 1.2%