There are 6 repositories under datapipeline topic.
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
Roadmap for Data Engineering
High Performance Tensorflow Data Pipeline with State of Art Augmentations and low level optimizations.
Tensorflow 2 Tutorials (use tensorflow and keras in a better way!)
Terraform module designed to easily backup EFS filesystems to S3 using DataPipeline
Building Json data pipeline within Snowflake using Streams and Tasks
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Ethereum client written in Go, modified for full-hierarchy data exports and block specimen production
Domain-specific language to help build and maintain AWS Data Pipelines
Awesome list for datapipeline
A GitHub Action to lint, test, build-docs, package, and run your kedro pipelines. Supports any Python version you'll give it (that is also supported by pyenv).
High speed message passing between various queues and services
Материалы для курса Введение в Data Engineering: дата пайплайны
Global Tree Cover Loss Analysis using Geotrellis and SPARK
Simple Airflow on Kubernetes (GKE)
Reactive Streams distributed datapipeline for data process. Now support kafka,jdbc,kudu,elasticsearch,hdfs.etc
GTFS Data Pipeline for TfNSW Bus Datasets
This is an ETL project - extracting data from an ecommerce transactional database on RDS, transforming the data using AWS glue job, and loading it to a Redshift data warehouse, and connected it to Tableau for BI
An ETL data pipeline that extracts data from source and loads it to destination, automated using mage.ai
A data pipeline project build on databricks and azure to demostrate lifecycle of a cloud data project.
This is a End-to-End Azure Data Engineering Project | Analysis on the entire ETL Pipeline - Azure Factory, Azure Lake Gen 2, Databricks, Azure Synapse Analytics & Dashboards
This is a project which demonstrates creation of a data pipeline by scraping data using twitter API and creating a data delivery stream using Kinesis Firehose for ingesting data to Amazon S3.
DBT and clickhouse test project with dagster
This project predicts wind turbine failure using numerous sensor data by applying classification based ML models that improves prediction by tuning model hyperparameters and addressing class imbalance through over and under sampling data. Final model is productionized using a data pipeline
The Centralized Data Warehouse and ML Solution for Banking Analytics is a project that combines a centralized repository for banking data with machine learning algorithms to enable predictive analysis.
A data pipeline to analyze real time cryptocurrency price
A data pipeline to daily pull public transport data from the opentransportdata.swiss portal. This pipeline has three tasks, pull the right data from opentransportdata.swiss, push the data to s3 for storage, and transform and load the transformed data to a database. Hopefully this repository helps people explain ETL / Batch data pipeline.
An Automated data pipeline using "Apache Airflow" performing "ETL" on RAW data using "Pandas" library then stage data into "PostgreSQL" then process it distributed cluster and parallelly using "Spark" and loaded final useful data into "ElasticSearch" NoSQL DB warehouse