There are 5 repositories under data-orchestration topic.
Orchestrate everything - from scripts to data, infra, AI, and business - as code, with UI and AI Copilot. Simple. Fast. Scalable.
An open source, standard data file format for graph data storage and retrieval.
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
Data-aware orchestration with dagster, dbt, and airbyte
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
This repo contains a dataset, exercises, and sample code for an end-to-end SAP BTP data-to-value bootcamp covering SAP HANA Cloud, SAP Data Warehouse Cloud, SAP Data Intelligence Cloud, and SAP Analytics Cloud.
A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran
Get started with Dagster ASAP
An operator for managing Alluxio system on Kubernetes cluster
CI/CD repository template to automate deployments of your production flows
A simple pipeline infrastructure with ETL pipeline contained in a Docker environment on Apache Airflow for orchestration and Postgres for data warehousing
Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform
EHR pipeline that simulates MIMIC-IV patient data streams, performs advanced feature engineering and clinical severity scoring using machine learning (Random Forest Classifier), and prepares structured outputs for scalable downstream analytics
Develop a real-time data ingestion pipeline using Kafka and Spark. Collect minute-level stock data from Yahoo Finance, ingest it into Kafka, and process it with Spark Streaming, storing the results in Cassandra. Orchestrated the workflow using Airflow deployed on Docker.
Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.
Data orchestration repo with Docker deployment
Working with SCD Type (Change Data Capture) and need a Data Vault model to test Azure Data Factory v2? - This Code with Help!
Code, scripts, and resources for the Data Engineering Fundamentals Course Webinar, covering Python, data pipelines, Apache Airflow, and more.
Prefect - Data orchestration tool practice & learning
Code examples of Luigi, Prefect, Kedro, Dagster, and MageAI
This project provides a complete, containerized environment for deploying Apache Airflow, designed for both robust local development and production use. It uses a multi-stage docker-compose setup and a Makefile for a simplified and professional developer experience.
MCP-native knowledge graph orchestrator that unifies data silos with GraphRAG, dynamic connectors, and local AI.
A poor-man's data lake fill with ducks
End to End data engineering project
☕ Data Orchestrator. Without abstractions
❌ Full-Stack Data Orchestration config by Yaml template with Flask & HTMX
Curso MLOps ITBA. Orquestación data y modelo con Dagster. Airbyte. DBT. MLFlow
Repository to store scripts and notes on Airflow
A modular data platform for end-to-end analytics, data pipeline orchestration, machine learning model registry in local open-source environments or for a commercial cloud setup.
This project orchestrates an end-to-end data pipeline for an e-commerce dataset using Apache Airflow (in Docker) and a separate dbt (data build tool) project. The pipeline transforms raw source data into structured, analytics-ready datasets.
dsl for drummond, a data pipeline orchestrator
I created this repo to follow along with the examples in the Dagster University Essentials course.
📄 Pipeline de dados meteorológicos utilizando Apache Airflow, dbt e PostgreSQL. Automatiza a ingestão, transformação e visualização de dados da API WeatherStack.