Data Pipeline with dbt, Snowflake, and Airflow

This project implements a data pipeline using industry-standard tools such as dbt, Snowflake, and Airflow. It facilitates the extraction, loading, and transformation (ELT) process of data, enabling analytics and reporting within the organization.

Project Overview

Tools Used:
- Snowflake: Cloud-based data warehousing platform.
- dbt (Data Build Tool): SQL-based transformation and modeling tool.
- Airflow: Workflow orchestration platform.
Data Modeling Techniques:
- Fact tables, data marts.
- Snowflake Role-Based Access Control (RBAC) concepts.

Setup Instructions

Snowflake Environment Setup:
- Create Snowflake accounts, warehouse, database, and roles.
- Define necessary schemas for staging and modeling.
Configuration:
- Update dbt_profile.yaml with Snowflake connection details.
- Configure source and staging files in the models/staging directory.
- Define macros in macros/pricing.sql for reusable calculations.
- Configure generic and singular tests for data quality.
Airflow Deployment:
- Update Dockerfile and requirements.txt for Airflow deployment.
- Add Snowflake connection details in Airflow UI.
- Create a DAG file (dbt_dag.py) to orchestrate dbt jobs.

File Structure

models/: Contains dbt models for staging, intermediate tables, and fact tables.
macros/: Contains reusable SQL macros for calculations.
tests/: Contains SQL scripts for generic and singular tests.
dbt_dag.py: Airflow DAG configuration file.

Usage

Clone the repository:

git clone https://github.com/your_username/data-pipeline.git

Set up Snowflake environment and configure necessary files.
Deploy Airflow with Docker and configure connections.
Start the Airflow scheduler and webserver:

docker-compose up -d

Access Airflow UI and trigger the dbt_dag DAG for execution.

ManojGowda27 / Data-Pipeline-with-dbt-Snowflake-and-Airflow

Data Pipeline with dbt, Snowflake, and Airflow

Project Overview

Setup Instructions

File Structure

Usage

About

Languages