Elliott-dev / ETL-with-Pandas-Project

Project uses Pandas to create multiple DataFrames from CSV files containing Disneyland Reviews and Chocolate Reviews.. Cleaned those DataFrames, then loaded to PostgreSQL to create a relational database to join everything together.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ETL-with-Pandas (Disneyland, Chocolate Bars)

image image image

Extract: There are 2 different datasets: Disneyland Review Ratings, and Chocolate Bar Ratings; limited to the years 2010-2019 from the public platform Kaggle conducted by Rachael Tatman and Arush Chillar respectively. The raw data can be found in Resources folder.

The datasets used for this project provided information on:

Disneyland Review Ratings & Chocolate Bar Ratings

Each CSV was made into a pandas DataFrame.

Transform:

Copied only the columns needed into a new DataFrame from both tables.

Renamed the column headers both tables.

Added 2 additional columns splitting timestamp into a year column and a month column on Disneyland Table.

Removed NULL values in year column on Disneyland Table.

Cacao Table passthrough.

Load: Created a connection to PostgreSQL database

Checked for a successful connection to the database and confirmed that the tables have been created

Appended DataFrames to tables;

Confirmed successful Load by querying database.

Documentation:

Documentation Containing Business Rules: ETL Mapping Documentation

Instructions: Running the program:

Open Disneyland_and_chocolate_ETL.ipynb

run everything and when prompted, enter pgAdmin password

About

Project uses Pandas to create multiple DataFrames from CSV files containing Disneyland Reviews and Chocolate Reviews.. Cleaned those DataFrames, then loaded to PostgreSQL to create a relational database to join everything together.


Languages

Language:Jupyter Notebook 100.0%