ashleyadrias/udacity-data-engineer-nanodegree-project1-data-modeling-with-postgres

Project Overview:

Create a star schema that optimizes searching for song play transactions.

Project Repository Files:

sql_queries.py contains all your sql queries into the notebooks

test.ipynb displays the first few rows of each table to let you check your database

create_tables.py drops and creates your tables

etl.ipynb reads and processes a single file from song_data and log_data and loads the data into your tables

etl.py reads and processes files from song_data and log_data and loads them into your tables

Database Design:

The Database was broken up into 4 dimensional tables and 1 fact table 4 Dimensional tables: Users, Time, Songs, and Artist 1 Fact table consisted data from all dimensional tables

ETL Process:

Extract log and song data from json to dataframes

Transform into its respective dataframe as per database design

Load into Postgres Database

How to run the program:

Use the jupyter notebooks to learn how the etl.py script works

Open a terminal and run the etl.py script

ashleyadrias / udacity-data-engineer-nanodegree-project1-data-modeling-with-postgres

About

Languages