priye-1 / OLAP_Dimensional_Modeling_for_Advanced_Analytics

This project unlocks the power of advanced analytics and reporting by transforming an OLTP architecture into an efficient OLAP setup. Leverage the capabilities of DBT and BigQuery to implement dimensional modeling, and drive data-driven decision-making.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Northwind Database OLTP to OLAP Transformation: Leveraging Dimensional Modeling for Advanced Analytics

This project unlocks the power of advanced analytics and reporting by transforming an OLTP architecture into an efficient OLAP system. It Leverages the capabilities of DBT and BigQuery to implement dimensional modelling and drive data-driven decision-making.

Aim

To modernise data reporting solution for Northwind through Dimensional Modeling.

Unveiling OLAP for Northwind OLTP Database

What is the current architecture?

  • Northwind traders are export-import companies who trade special foods around the world
  • This is a sample database created by Microsoft to demonstrate the features of some of its products, and for training and tutorials.
  • The existing architecture is a mix of on-premise and legacy systems
  • They use Mysql for their main sales daily transactions
  • They use Mysql to build and run reports which were not efficient as the analytical queries impacted the processing speed of the transactions system

Why the need for a new architecture?

  • For better scalability
  • To improve reporting speed
  • To reduce the load on operational systems
  • To improve data security through better access control

How do we implement a new architechture?

  • Northwind traders can migrate an existing database to GCP
  • MySQL on-prem can be replaced by a fully managed cloud SQL
  • For reporting solutions, an OLAP data warehouse on GCP using Bigquery will be built
  • Dimensional Datawarehouse will be built on Bigquery using Kimballs approach with dim and fact tables

Identifying Business Requirements

There are many business Processes that can be derived from the Northwind database through the E-R diagram. However, we will be focusing on three processes:

  • Sales Overview: Overall sales reports to understand better, what is being sold to our customers, what sells the most, where and what sells the least, the goal is to have a general overview of how the business is going.
  • Product Inventory: Understand the current inventory levels and how to improve stock management, what suppliers we have, and how much is being purchased. This will allow Northwind to understand stock management and potentially land better deals with suppliers
  • Customer Reporting: Allow customers to understand their purchase orders, how much and when they are buying, empowering them to make data-driven decisions while Northwind utilizes this data in combination with its sales data.

Identifying required tables from ERD


  • Customers - Customers who buy items from Northwind
  • Employees - Those who work for Northwind
  • Orders - Sales Order transactions taking place between the customers & Northwind
  • Order Details - Order Details for the Orders placed by customer
  • Inventory Transaction - Transaction details of each inventory
  • Products - Current Northwind products that customers can purchase
  • Shippers - Shipped orders from Northwind to customers
  • Suppliers - Supplies Northwind with required items
  • Invoices - Invoice created for each order
  • Proposed Kimball Data Warehouse Architecture



    From the image below you can find the three layers (datasets) created in Bigquery through DBT. They are identified by the "dbt prefix"

    Proposed Data Modelling Concepts

    • Conceptual Data Model



    • Logical Data Model



    • Physical Data Model



    Results

    • The new Data Warehouse uses Bigquery for analytics and Business Intelligence which is more efficient than the previous MySQL system.
    • The Reporting is derived from One Big Table denormalised from Dimensional models
    • Sales Overview, Product Inventory, and Customer Reporting processes can now be carried out effectively to draw out insights

    Getting started on dbt project

    • Commands to install dbt and connect to bigquery here
    • Commands to create tables and insert data here
    • Commands to create Dim and Fact tables in different layers can be found here
    • If you are not able to enable billing for Bigquery on your account, insert data manually by uploading csv files located here

    Resources:

    • Learn more about dbt in the docs
    • Check out Discourse for commonly asked questions and answers
    • Join the chat on Slack for live discussions and support
    • Find dbt events near you
    • Check out the blog for the latest news on dbt's development and best practices

    About

    This project unlocks the power of advanced analytics and reporting by transforming an OLTP architecture into an efficient OLAP setup. Leverage the capabilities of DBT and BigQuery to implement dimensional modeling, and drive data-driven decision-making.


    Languages

    Language:Shell 100.0%