ohikendoit / data_modelling_with_postgres

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project: Data Modeling with Postgres

Completed by Ken Jung, as part of the Udacity Data Engineering Nanodegree Program

Introduction

Fictional start-up called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. Their analytics team is particularly interested in understanding what songs users are listening to. Currently, they don't have an easy way to query their data, which resides in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app. As a data engineer, the tasks involve creating a Postgres database with tables to optimize queries on song play analysis.

Project Workspace and Files

  • Data: original dataset for logs and songs in the format of JSON
  • create_tables.py: Schema creation
  • etl.py: ETL process
  • sql_queries.py: SQL queries
  • etl.ipynb: ETL helper notebok
  • test.ipynb: Postgres SQL notebook

About

License:MIT License


Languages

Language:Jupyter Notebook 87.8%Language:Python 12.2%