susiexia / Movies-ETL

Automated ETL(Extract, Transform and Load) Pipeline.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Movies-ETL

Extract, Transform and Load (ETL) from Wikipedia(JSON format), Kaggle(csv file) MovieLens_Ratings (csv file) into PostgreSQL, perform the transfromation step by python and pandas.

Resources:

  • a unstructure web-scraped JSON file of over 5,000 movies from 1990 to 2019, from wikipediawikipedia.movies.json

  • a unstructure csv file from Kaggle movies_metadata.csv

  • a large unstructure csv file from MovieLens with movie rating information (ratings.csv)

Outputs:

Challenge

Goals:

  • Create an automated ETL pipeline.

  • Extract data from multiple sources.

  • Clean and transform the data automatically using Pandas and regular expressions.

  • Load new data into existing tables in PostgreSQL

Challenge Outputs:

About

Automated ETL(Extract, Transform and Load) Pipeline.


Languages

Language:Jupyter Notebook 87.4%Language:Python 12.6%