bhanuteja2001 / End-To-End-Free-Tools-Data-Enginner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

End-To-End-Free-Tools-Data-Enginner

image

Introduction

The goal of this project is to perform Data Analytics ensuring Data Quality throughout the entire Extraction, Transformation, and Load (ETL) process using various tools and technologies. Additionally, the goal of this project is to assist data engineers in their daily tasks by monitoring data flow and quality. Help us with that questions:

  • Is any data missing from the source?
  • Are the source data within the previously reported standards?
  • Was there any error during the reading, transformation, and final loading process into the report?
  • Was there a change in the script or business rule during data transformation?
  • Was there a change in the business rule?

Technology Used

  • Programming Language - Python and SQL
  1. Database - PostgreSQL
  2. Ingestion - Airbyte
  3. Data Quality - Great Expectations
  4. Orchestration - Airflow
  5. Data Visualization - MetaBase
  6. Environment - Docker

Data Model

  • Transactional Database (Source) image
  • Data Warehouse ( Final Destination ) image

Ingestion

  • Airbyte: To simplify data ingestion, ensuring smooth integration between the source and the staging area, monitoring schema/column changes, and ensuring the first full load and the subsequent incremental ones using Airbyte's own resources, it's fantastic. image

Data Quality

  • We set clear expectations for our data and ensure its quality every step of the way. Here, every time the data is moved or transformed, a check is carried out to see if it is as expected according to the business rules. Data Quality is directly linked to how the business works too.

Examples

  • Checking null values image
  • Checking if the values in the Quantidade column follow the business rules and expected data image
  • Checking if the values in the Nome_tipo column follow the business rules and expected data image

Orchestration

  • Efficiently and automatically orchestration all steps of the data workflow. From ingestion to transformation and delivery, Airflow robustly provided end-to-end data process management and monitoring. fluxo airflow Executado

Data Visualization

  • Create visualizations and actionable insights from the processed data.

image image

About


Languages

Language:Jupyter Notebook 88.2%Language:Python 11.8%