Movies-ETL

Project Overview

The following project served as a form of applying the Extract-Transform-Load method in order to convert raw data that needs to be cleanes and structured before it can be analyzed. This goes as follows:

Extract
- Read the data, often from multiple sources
Transform
- Clean and structure the data in desired form
Load
- Write the data into a database for storage

For this specific exercise we extracted information from 3 different sources about different movies from 1990 to 2018, automating a pipeline that takes new data and does all the process instinctively.

Resources

Software:
- Jupyter Notebook
- Python 3.9.7
- Anaconda 4.11.0
- pgAdmin 4
Libraries:
- Pandas
- SQLAlchemy
- NumPy
- Psycopg2
- re (regular expression operators)
Data Source:
- movies_metadata.csv
- ratings.csv
- wikipedia_movies.json

yempi / Movies-ETL

Movies-ETL

Project Overview

Resources

About

Languages