aditishraq / Ride-Sharing-ETL-Pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ride Sharing ETL Pipeline

A data pipeline with GCP Storage, Python, Compute Instance, Mage Data Pipeline Tool, BigQuery, and Looker Studio.

Description

Objective

The goal of this project is to analyze Uber data using tools like GCP Storage, Python, Compute Instance, Mage Data Pipeline Tool, BigQuery, and Looker Studio. We aim to explore TLC Trip Record Data for yellow and green taxi trips, focusing on details like dates/times, locations, distances, fares, rate types, payment types, and passenger counts. The objective is to gain insights and perform data analytics for informed decision-making.

Dataset

TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

Here is the dataset used - Link

More info about dataset can be found here:

  1. Website - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
  2. Data Dictionary - https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf

Tools & Technologies

Architecture

architecture

Data Model

datamodel

Dashboard

dashboard

Setup

If you already have a Google Cloud account, you can skip the pre-requisite steps.

About


Languages

Language:Python 100.0%