aijazafzaal / ETL-Pipeline-GCP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Uber Data Analytics | Data Engineering GCP Project

Introduction:
The goal of this project is to perform analytics on Uber data using various tools and technologies, including GCP Storage, Python, Compute Instance, Mage Data Pipeline Tool, BigQuery, and Looker Studio.

Architecture
architecture

Technology Used
Programming Language - Python
Google Cloud Platform
Google Storage
Compute Instance
BigQuery
Looker Studio(potential use)
Modern Data Pipeine Tool - https://www.mage.ai/

Dataset Used
TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

More info about dataset can be found here:

Website - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Data Dictionary - https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
Data Model:

data_model

About


Languages

Language:Jupyter Notebook 92.0%Language:Python 8.0%