hamadalaqeel / dend-projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

List of Data Engineering Nanodegree projects

Data Modeling with Postgres

The task is to create a star schema for Postgres and develop an ETL pipleine which will transfer the data from local files to the database. Code can be found here.

Data Modeling with Apache Cassandra

The objective of this project is to create a NoSQL analytics database in Apache Cassandra for a fictional music streaming service called Sparkify. Code can be found here.

Data Warehouse using AWS

The task includes facilitating the Sparkify start up in setting up a data warehouse that would have the songs data to which the users are listening to. The project is written in python and uses Amazon s3 for file storage and Amazon Redshift for database storage and data warehouse purpose. Code can be found here.

Data Lake using AWS

The purpose of this project is to build an ETL pipeline that will be able to extract song and log data from an S3 bucket, process the data using Spark and load the data back into S3 as a set of dimensional tables in Spark parquet files. Code can be found here.

Data pipelines with Airflow

The objective is to create custom operators to perform tasks such as staging the data, filling the data warehouse and running checks. The tasks will need to be linked together to achieve a coherent and sensible data flow within the pipeline. Code can be found here.

Capstone Project

This project aims to be able to answers questions on US immigration such as what are the most popular cities for immigration, what is the gender distribution of the immigrants, what is the visa type distribution of the immigrants, what is the average age per immigrant and what is the average temperature per month per city. Code can be found here.

About