dbrunik2019 / Movies_ETL

Movies_ETL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Movies_ETL

Movies_ETL Background Amazing Prime loves the dataset and wants to keep it updated on a daily basis. Britta needs your help to create an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. You’ll need to refactor the code from this module to create one function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data—and performs the ETL process by adding the data to a PostgreSQL database.

Overview of Project Wikipedia has a ton of information about movies, including budgets and box office returns, cast and crew, production and distribution, and so much more. Luckily, one of Britta's coworkers created a script to go through a list of movies on Wikipedia from 1990 to 2018 and extract the data from the sidebar into a JSON. Unfortunately, her coworker can't find the script anymore and just has the JSON file. We'll need to load in the JSON file into a Pandas DataFrame.

Project Deliverables Deliverable 1: Write an ETL Function to Read Three Data Files Deliverable 2: Extract and Transform the Wikipedia Data Deliverable 3: Extract and Transform the Kaggle Data Deliverable 4: Create the Movie Database Deliverable 5: A written report on the Movie Database analysis README.md.

Resources and Before Start Notes: Data Source: ETL Deliverable 1, ETL Deliverable 2 and ETL Deliverable 3 Data Tools: PostgreSQL, pgAdmin Software: pgAdmin 4.26, Python 3.8.3, Visual Studio Code 1.50.0, Flask Version 1.0.2 For more information, read the Documentation on PostgreSQL and other data typess.

About

Movies_ETL


Languages

Language:Jupyter Notebook 100.0%