gupta-aayushkr / Azure-Portfolio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project 1: NYC Taxi Data Analysis

Overview

  • Analyze NYC taxi data with Azure Synapse Analytics, Apache Spark, and Power BI.
  • Explore seven main tables covering different taxi types and boroughs.

Resources and Architecture

  • Azure Synapse Analytics, Data Lake Storage, Serverless SQL Pool, Apache Spark, and Power BI.
  • End-to-end data processing pipeline for seamless big data handling.

Pipeline & Reporting

  • Synapse Pipelines orchestrate data processing stages with triggers for automation.
  • Power BI generates insightful reports on payment methods and taxi demand.

Budget Analysis

  • Breakdown of costs from Azure Synapse Analytics, SQL Serverless Pool, Pipelines, and Storage.

Future Enhancements

  • Explore cost optimization, real-time processing, machine learning, data governance, and security measures.

Project 2: Formula 1 Racing

Overview

  • Process Formula 1 racing data with Azure Databricks, ADLS, and ADF.
  • Utilize Ergest Developer API for tables like circuits, races, constructors, drivers, and results.

Resources and Architecture

  • Azure Databricks, Data Lake Gen2, Data Factory, and Key Vault.
  • Step-by-step setup of storage, compute, raw data ingestion, and presentation for analysis.

Pipeline & Orchestration

  • Azure Data Factory orchestrates Ingest, Transform, and Process Pipelines with triggers.
  • Tumbling window triggers and custom triggers for automation based on date ranges.

Budget Analysis

  • Total cost: Rs. 3200, exceeding Rs. 3000 budget.
  • Control costs with cluster timeout management.

About