There are 44 repositories under data-pipelines topic.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
An orchestration platform for the development, production, and observation of data assets.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
The best place to learn data engineering. Built and maintained by the data engineering community.
Dataform is a framework for managing SQL based data operations in BigQuery
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
One framework to develop, deploy and operate data workflows with Python and SQL.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Work with your web service, database, and streaming schemas in a single format.
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Relational data pipelines for the science lab
An Open Source PHP Reporting Framework that helps you to write perfect data reports or to construct awesome dashboards in PHP. Working great with all PHP versions from 5.6 to latest 8.0. Fully compatible with all kinds of MVC frameworks like Laravel, CodeIgniter, Symfony.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Cloud-native, data onboarding architecture for Google Cloud Datasets
Data pipelines from re-usable components
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
Beneath is a serverless real-time data platform ⚡️
Multi-hop declarative data pipelines