Fencekeeper / Real-Time-Streaming-with-Azure-Databricks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Real-Time-Streaming-with-Azure-Databricks

Project Overview

Welcome to the "Real-Time Streaming with Azure Databricks" repository. This project demonstrates an end-to-end solution for real-time data streaming and analysis using Azure Databricks and Azure Event Hubs, with visualization in Power BI. It's an in-depth guide covering the setup, configuration, and implementation of a streaming data pipeline following the medallion architecture.

Getting Started

To get started with this project, clone the repository and follow the guidance provided in this YouTube tutorial.

Repository Contents

  • Real-time Data Processing with Azure Databricks (and Event Hubs).ipynb: The Databricks notebook used for data processing at each layer of the medallion architecture.
  • data.txt: Contains sample data and JSON structures for streaming simulation.
  • Azure Solution Architecture.png: High level solution architecture.

Prerequisites

  • Active Azure subscription with access to Azure Databricks and Event Hubs.
  • Databricks Workspace with Unity Catalog Enabled.
  • Azure Event Hubs Service.
  • Power BI Desktop (Windows).
  • Familiarity with Python, Spark, SQL, and basic data engineering concepts.

About


Languages

Language:Jupyter Notebook 100.0%