python pandas sqlalchemy-python postgresql regular-expression etl-pipeline

Movies-ETL

Extract, Transform and Load (ETL) from Wikipedia(JSON format), Kaggle(csv file) MovieLens_Ratings (csv file) into PostgreSQL, perform the transfromation step by python and pandas.

Resources:

a unstructure web-scraped JSON file of over 5,000 movies from 1990 to 2019, from wikipediawikipedia.movies.json
a unstructure csv file from Kaggle movies_metadata.csv
a large unstructure csv file from MovieLens with movie rating information (ratings.csv)

Outputs:

ETL Python script movies_ETL.py
ETL Jupyter NoteBooks movies_ETL.ipynb

Challenge

Goals:

Create an automated ETL pipeline.
Extract data from multiple sources.
Clean and transform the data automatically using Pandas and regular expressions.
Load new data into existing tables in PostgreSQL

Challenge Outputs:

Challenge automated Pipeline Python scriptchallenge.py
Challenge automated Pipeline Jupyter NoteBooks challenge.ipynb

About

Automated ETL(Extract, Transform and Load) Pipeline.

python pandas sqlalchemy-python postgresql regular-expression etl-pipeline

Languages

Language:Jupyter Notebook 87.4%Language:Python 12.6%