There are 7 repositories under delta-lake topic.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
An open protocol for secure data sharing
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Analytical database for data-driven Web applications 🪶
Amazon SageMaker Local Mode Examples
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
The Internals of Delta Lake
Sample project to demonstrate data engineering best practices
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
A Minimalistic Rust Implementation of Delta Sharing Server.
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
A Delta Lake reader for Dask
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
Native Delta Lake Implementation in Go
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
DataPulse is a platform for developers to build, schedule and monitor data pipelines.
Spark data pipeline that processes movie ratings data.
Spark structured streaming examples with using of version 3.5.1
dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats
Repositório dedicado a Workshop de Data Lakehouse com Delta Lake
Template to spin up delta lake locally using docker
Stream Loader for Apache Doris
Awesome content all about Azure Databricks