thyanhbui1412 / Noise-Crime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ETL NYC Opendata with Socrata API

This is a team project when I took the class Data Warehouse at Baruch College, which sparked a lot of curiosity from me and I therefore embarked on data engineer career path. In this project,

I learn to build a mini ETL where I streamlined NYC open data with Socrata API to target data warehouse GCP, then streamed from GCP to Jupiter Notebook for data profiling, then used dbt built data marts, fact tables in GCP.

I developed Python script to stream real-time data: /DWProject-ExtractingNoiseData.py and //DWProject-ExtractingCrimeData.py.

I also took main part in performing ETL logic, workflows and data transformation with dbt, which was documented in /ETL project - 311 NYC Open Data Source.pdf . Working in a team, I learned to maintain well-documented codes and process. If time permitted, I want to build Python scripts to backfill archive data and upcoming data, schedule a production run in dbt so that the ETL process is continously streamed and transformed.

Finally, big thanks to my teammates for countless Zoom meetings, team work and for taking the main role in BI analytics and data visualization.

Please do not use this shared project for your classroom purpose.

About


Languages

Language:Python 100.0%