There are 174 repositories under data-engineering topic.
Apache Superset is a Data Visualization and Data Exploration Platform
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
The Data Engineering Cookbook
Roadmap to becoming a data engineer in 2021
The easiest way to automate your data
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Always know what to expect from your data.
An orchestration platform for the development, production, and observation of data assets.
Free Data Engineering course!
Fancy stream processing made operationally mundane
Feature Store for Machine Learning
Open Source Feature Flagging and A/B Testing Platform
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Git-like capabilities for your object storage
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
A list of useful resources to learn Data Engineering from scratch
:bar_chart: :clipboard: Dashboards using YAML or JSON files
A low code Machine Learning service that personalizes articles, listings, search results, recommendations to boost user engagement. A friendly Learn-to-Rank engine
Quilt is a self-organizing data hub for S3
A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Clean APIs for data cleaning. Python implementation of R package Janitor
Example project implementing best practices for PySpark ETL jobs and applications.
A Data Engineering & Machine Learning Knowledge Hub
Data profiling, testing, and monitoring for SQL accessible data.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Accumulated knowledge and experience in the field of Data Engineering
An Awesome List of Open-Source Data Engineering Projects
Feathr – An Enterprise-Grade, High Performance Feature Store
Machine Learning automation and tracking
Datart is a next generation Data Visualization Open Platform
Polyglot workflows without leaving the comfort of your technology stack.
Open Metadata and Governance