CD-AC / DataEnginner-Leagues_Snowflake

Project focusing on extracting, transforming, and loading (ETL) soccer league data from website into a Snowflake data warehouse, with subsequent analysis using Power BI.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataEnginner-Leagues_Snowflake

Banner Image

Description

This project is designed to streamline the process of data acquisition from European soccer leagues' standings, available on ESPN's website. Leveraging the power of Apache Airflow, containerized with Docker, we've automated the ETL process to cleanse, transform, and prepare the data for analysis. The cleaned data is temporarily stored in a staging area before being loaded into a final table in Snowflake. Finally, we utilize Power BI to connect to the Snowflake table and generate comprehensive statistical reports on the leagues.

Architecture

Banner Image

Data Visualization with Power BI

Banner Image

Features

  • Data Extraction: Automated scraping of European soccer leagues data from ESPN's website.
  • Data Transformation and Cleaning: Utilization of Apache Airflow to automate the data cleansing and transformation processes.
  • Containerization: Docker containerization of Apache Airflow for enhanced scalability and ease of deployment.
  • Data Storage: Efficient data loading into a Snowflake data warehouse, with a structured approach for both staging and final data storage.
  • Data Analysis: Advanced statistical reporting and analysis using Power BI, connected directly to Snowflake.

About

Project focusing on extracting, transforming, and loading (ETL) soccer league data from website into a Snowflake data warehouse, with subsequent analysis using Power BI.


Languages

Language:Jupyter Notebook 98.7%Language:Python 1.3%Language:Dockerfile 0.0%