There are 9 repositories under delta-lake topic.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Real-time analytics on Postgres tables
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
An open protocol for secure data sharing
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Analytical database for data-driven Web applications 🪶
Amazon SageMaker Local Mode Examples
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Sample project to demonstrate data engineering best practices
The Internals of Delta Lake
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
A Minimalistic Rust Implementation of Delta Sharing Server.
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
A Delta Lake reader for Dask
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
Native Delta Lake Implementation in Go
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore, Minio, Postgres)
Spark data pipeline that processes movie ratings data.
dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats
Spark structured streaming examples with using of version 3.5.1
Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
Stream Loader for Apache Doris
Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store. Supports Data Lakehouse Architecture. Clean Once and Forget.