tahhnik / Designing-Large-Scale-Data-Warehouse-with-Azure-Synapse-Analytics

The repo simulates query load, optimizes performance, and offers practical guidance for building data warehouses with Azure Synapse Analytics. πŸš€

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

From Design to Deployment: Data Warehousing with Azure Synapse Analytics

The repo simulates query load, optimizes performance, and offers practical guidance for building data warehouses with Azure Synapse Analytics. πŸš€

The project follows the following architecture,

Infrastructure Diagram

Directories πŸ“‚

  1. data: The datasets, fact, and dimension tables for the data warehouse
  2. Queries_Part4.sql: SQL code for querying and analyzing the created data warehouse
  3. setup.json: ARM (Azure Resource Manager) template. It is a block of code that defines the infrastructure and configuration for the project
  4. setup.ps1: PowerShell script to provision the Azure Synapse Workspace along with the tables of the data warehouse (configured in setup.sql)
  5. setup.sql: SQL script for creating the tables of the data warehouse
  6. table_creation_codes.sql: SQL script to create the fact and dimension tables with dedicated SQL pool

Prerequisites:

  1. An active Azure subscription
  2. Knowledge of Azure Data Fundamentals, Good to begin from here

Getting Started πŸš€

  1. Clone the Repository:
    git clone https://github.com/tahhnik/Designing-Large-Scale-Data-Warehouse-with-Azure-Synapse-Analytics.git
    

To Set Up on the Azure Portal (Detailed walkthrough is in the blog posts)

Clone the repository inside Azure workspace through PowerShell on Azure Portal

git clone https://github.com/tahhnik/Designing-Large-Scale-Data-Warehouse-with-Azure-Synapse-Analytics.git

synapase-ui-shell-gitclone

  1. Explore the Directories: Navigate into each directory to find detailed automation scripts, SQL codes for queries, and configurations.

  2. Follow the Blog: Implementation details and insights are documented in the associated series of blog posts in Medium.

    ⚑If you want to skip the initial processes like data modeling, and schema design and directly jump onto building the warehouse on Azure Synapse Analytics, follow the Blog Three,

    ⚑If you want to skip the building warehouse processes on Azure Synapse Analytics and directly jump onto querying the tables with SQL (T-SQL in this context), follow the Blog Four

Tools Explored πŸ› οΈ

  1. Azure Synapse Analytics: Enterprise analytics service for data warehouses and big data systems.
  2. Azure Portal: Unified console to manage Azure resources.
  3. Azure Stream Analytics: Real-time stream processing engine.
  4. Azure Machine Learning: Cloud service for ML project lifecycle.
  5. Azure Data Lake Storage Gen2: Scalable storage for data lakes.
  6. Power BI: Business analytics service for data visualization.
  7. Azure Function Apps: Serverless applications for event-driven scenarios.
  8. Azure Cosmos DB: Globally distributed, multi-model database.

Blog Implementation πŸ“

To implement this project, follow the step-by-step guide in our detailed blog post. Learn how each tool plays a crucial role in creating and scaling a data warehouse on Azure.

Final Draft

Components:

  • The architecture of the data warehouse
  • The details of the data pipeline
  • Brief Discussions of the tools and processes used
Final Draft (3)

Components:

  • Exploring the attributes of each logical entity in the context of retail companies

  • The details of the schema and developing the snowflake schema

  • schema-design-smol

  • Brief Discussions of the data model

Final Draft (3)

Components:

  • Provisioning the Azure Synapse Analytics Workspace with UI and ARM templates synapase-ui-shell-deploy1

  • Provisioning dedicated SQL pool within Azure Synapse Analytics Workspace

  • Creating the SQL database and tables (facts and dimensions)

  • Loading data into the tables

  • azure-shell-sqlfiles

Blog Four: From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 3: Querying the Data Warehouse) (is being written at this moment)

Final Draft (3)

Components:

  • Querying the data warehouse
  • Showcasing the analytical capabilities of Azure Synapse Analytics

Acknowledgments πŸ™Œ

My humble gratitude to my friends and family who are the constant support of my works and endeavors

Connect with Me ⬇️

LinkedIn Twitter

About

The repo simulates query load, optimizes performance, and offers practical guidance for building data warehouses with Azure Synapse Analytics. πŸš€


Languages

Language:TSQL 61.7%Language:PowerShell 38.3%