samadarshad/UdacityDataC3

Background

This project extracts data from a sample songplay dataset on a S3 bucket, transforms it to fact and dimension tables in Redshift.

Follow instructions in L3 Exercise 2 - IaC - Solution.ipynb to set up a Redshift Cluster.

Data in http://udacity-dend.s3.us-west-2.amazonaws.com

Store the HOST and ARN from the Redshift setup in dwh.cfg

Run create_tables.main()

Run etl.main()

Use 'Select * From stl_load_errors' to debug errors

Use a subset of Song data for faster loading and debugging i.e. set SONG_DATA='s3://udacity-dend/song-data/A/A'

Populated staging tables:

Staging Events

Staging Songs

Populated tables:

Artists

Songs

Users

Songplays

Times

Redshift

Language:Jupyter Notebook 69.1%Language:Python 17.1%Language:HTML 13.8%