hamadalaqeel/data-lake-aws

Introduction

The purpose of this project is to build an ETL pipeline that will be able to extract song and log data from an S3 bucket, process the data using Spark and load the data back into s3 as a set of dimensional tables in spark parquet files. This helps analysts to continue finding insights on what their users are listening to.

Database Schema Design

Instructions

Add appropriate AWS IAM Credentials in dl.cfg
Specify desired output data path in the main function of etl.py
Run etl.py

About

Project 4 for Data Engineering Nanodegree

Languages

Language:Python 100.0%