There are 56 repositories under data-engineering-pipeline topic.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
One framework to develop, deploy and operate data workflows with Python and SQL.
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
Code examples showing flow deployment to various types of infrastructure
Classwork projects and home works done through Udacity data engineering nano degree
Let your pipe lines flow thru the Python code in xonsh.
Deploy a Prefect flow to serverless AWS Lambda function
Apache Spark Guide
F1 Data Pipeline
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apache Kafka and stored in a local Cassandra database.
Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
Challenge to job: Data Scientist
Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database
A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.
Marshmallow serializer integration with pyspark
Data Engineering Project with Hadoop HDFS and Kafka
A data engineering pipeline for digital marketers.
An end-to-end Twitter Data Pipeline that extracts data from Twitter and loads it into AWS S3.
Social Media Analysis, scalable solution, flexible deployment that analyses social media contents
Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.
Data Engineering ๐ ๏ธ is like the backbone of data processing ๐, managing data pipelines ๐, warehouses ๐ข, and lakes ๐. It's the bridge ๐ between raw data and actionable insights, powering businesses ๐ with efficient data management and analytics ๐.
A streaming ETL pipeline for Realtime Tweet Collection, Analysis and Reporting
Project demonstrating how to automate Prefect 2.0 deployments to AWS EKS
Solution for the Ultimate Student Hunt Challenge (1st place).
๐๐๐ A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api ๐บ
The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technologies such as Apache Airflow, Apache Spark, Tableau and couple of AWS services
Get started with Prefect by scheduling your Prefect flows with GitHub Actions
End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
This is an ETL project - extracting data from an ecommerce transactional database on RDS, transforming the data using AWS glue job, and loading it to a Redshift data warehouse, and connected it to Tableau for BI
Docker powered starter for geospatial analysis of lightning atmospheric data.