There are 413 repositories under data-engineering topic.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Free Data Engineering course!
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Workflow Engine for Kubernetes
Roadmap to becoming a data engineer in 2021
An orchestration platform for the development, production, and observation of data assets.
Always know what to expect from your data.
Fancy stream processing made operationally mundane
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
SQL stream processing, analytics, and management. We decouple storage and compute to offer efficient joins, instant failover, dynamic scaling, speedy bootstrapping, and concurrent query serving.
Open Source Feature Flagging and A/B Testing Platform
The open source high performance ELT framework powered by Apache Arrow
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
Privacy and Security focused Segment-alternative, in Golang and React
Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
A collection of scientific methods, processes, algorithms, and systems to build stories & models.
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
A list of useful resources to learn Data Engineering from scratch
Memphis.dev is a highly scalable and effortless data streaming platform
Data Science Roadmap from A to Z
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Quadratic | Technical Spreadsheet with Python, SQL, and AI
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
CSVs sliced, diced & analyzed.