There are 63 repositories under data-pipeline topic.
Privacy and Security focused Segment-alternative, in Golang and React
Memphis.dev is a highly scalable and effortless data streaming platform
A list of useful resources to learn Data Engineering from scratch
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A lightweight stream processing library for Go
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Example end to end data engineering project.
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to tools like Hightouch, Census, and RudderStack.
A list about Apache Kafka
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Streaming reactive and dataflow graphs in Python
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
A Clojure machine learning library
Pipebird is open source infrastructure for securely sharing data with customers.
Content for architecting a data science platform for products using Luigi, Spark & Flask.
:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript