yempi / Movies-ETL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Movies-ETL

Project Overview

The following project served as a form of applying the Extract-Transform-Load method in order to convert raw data that needs to be cleanes and structured before it can be analyzed. This goes as follows:

  • Extract

    • Read the data, often from multiple sources
  • Transform

    • Clean and structure the data in desired form
  • Load

    • Write the data into a database for storage

For this specific exercise we extracted information from 3 different sources about different movies from 1990 to 2018, automating a pipeline that takes new data and does all the process instinctively.

Resources

  • Software:

    • Jupyter Notebook
    • Python 3.9.7
    • Anaconda 4.11.0
    • pgAdmin 4
  • Libraries:

    • Pandas
    • SQLAlchemy
    • NumPy
    • Psycopg2
    • re (regular expression operators)
  • Data Source:

    • movies_metadata.csv
    • ratings.csv
    • wikipedia_movies.json

About


Languages

Language:Jupyter Notebook 100.0%