There are 471 repositories under data-engineering topic.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Workflow Engine for Kubernetes
An orchestration platform for the development, production, and observation of data assets.
Roadmap to becoming a data engineer in 2021
Always know what to expect from your data.
Fancy stream processing made operationally mundane
Real-time event streaming platform. Streaming CDC, stream processing, low-latency serving, and Iceberg management.
Open Source Feature Flagging and A/B Testing Platform
The open source ELT framework powered by Apache Arrow
Business intelligence as code: build fast, interactive data visualizations in SQL and markdown
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
Privacy and Security focused Segment-alternative, in Golang and React
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Data Science Roadmap from A to Z
A list of useful resources to learn Data Engineering from scratch
Spreadsheet with AI, Code, Connections
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
A collection of scientific methods, processes, algorithms, and systems to build stories & models.
Memphis.dev is a highly scalable and effortless data streaming platform
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.