Completed by Ken Jung, as part of the Udacity Data Engineering Nanodegree Program
Fictional start-up called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. Their analytics team is particularly interested in understanding what songs users are listening to. Currently, they don't have an easy way to query their data, which resides in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app. As a data engineer, the tasks involve creating a Postgres database with tables to optimize queries on song play analysis.
- Data: original dataset for logs and songs in the format of JSON
- create_tables.py: Schema creation
- etl.py: ETL process
- sql_queries.py: SQL queries
- etl.ipynb: ETL helper notebok
- test.ipynb: Postgres SQL notebook