Extract, Transform and Load (ETL) from Wikipedia(JSON format), Kaggle(csv file) MovieLens_Ratings (csv file) into PostgreSQL, perform the transfromation step by python and pandas.
-
a unstructure web-scraped JSON file of over 5,000 movies from 1990 to 2019, from wikipediawikipedia.movies.json
-
a unstructure csv file from Kaggle movies_metadata.csv
-
a large unstructure csv file from MovieLens with movie rating information (ratings.csv)
-
ETL Python script movies_ETL.py
-
ETL Jupyter NoteBooks movies_ETL.ipynb
-
Create an automated ETL pipeline.
-
Extract data from multiple sources.
-
Clean and transform the data automatically using Pandas and regular expressions.
-
Load new data into existing tables in PostgreSQL
-
Challenge automated Pipeline Python scriptchallenge.py
-
Challenge automated Pipeline Jupyter NoteBooks challenge.ipynb