delta-lake

There are 9 repositories under delta-lake topic.

apache / doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
agent ai bigquery database dbt delta-lake elt hudi iceberg lakehouse olap paimon query-engine real-time redshift snowflake spark sql
Language:Java 14269
trinodb / trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
analytics big-data data-science database databases datalake delta-lake distributed-database distributed-systems hadoop hive iceberg java jdbc presto prestodb query-engine sql trino
Language:Java 11871
StarRocks / starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
analytics big-data cloudnative database datalake delta-lake distributed-database hudi iceberg join lakehouse lakehouse-platform mpp olap real-time-analytics real-time-updates realtime-database sql star-schema vectorized
Language:Java 10663
delta-io / delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
acid analytics big-data delta-lake spark
Language:Scala 8271
roapi / roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
sql graphql arrow rest-api analytics query columnar rust in-memory-database datafusion blob-storage cloud-native parquet query-frontends datasets static-datasets s3 delta-lake
Language:Rust 3340
delta-io / delta-rs
A native Rust library for Delta Lake, with bindings into Python
databricks delta delta-lake pandas pandas-dataframe python rust
Language:Rust 2946
pg_mooncake
Mooncake-Labs / pg_mooncake
Real-time analytics on Postgres tables
analytics columnstore delta-lake iceberg lakehouse parquet postgresql
Language:Rust 1674
databricks / LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
apache-spark spark structured-streaming spark-sql spark-mllib mllib mlflow delta-lake
Language:Scala 1339
apache / incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
apache-hudi apache-iceberg delta-lake
Language:Java 1101
delta-io / delta-sharing
An open protocol for secure data sharing
big-data data-sharing delta-lake pandas spark
Language:Scala 870
koheesio
Nike-Inc / koheesio
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
data-engineering delta-lake pydantic pyspark python
Language:Python 630
seafowl
splitgraph / seafowl
Analytical database for data-driven Web applications 🪶
database http sql api edge serverless visualization rust datafusion delta-lake delta-rs
Language:Rust 495
aws-samples / amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
sagemaker amazon-sagemaker pytorch catboost lightgbm pycharm tensorflow-training pytorch-training sagemaker-processing prophet scikit-learn prophet-model hdbscan-clustering-algorithm huggingface huggingface-transformers machine-learning delta-lake gensim-word2vec dask tensorflow
Language:Python 259
adidas / lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
big-data configuration-driven data-engineering data-quality databricks delta-lake framework great-expectations lakehouse spark
Language:Python 240
josephmachado / data_engineering_best_practices
Sample project to demonstrate data engineering best practices
data-engineering delta-lake etl great-expectations minio pyspark spark
Language:Python 196
japila-books / delta-lake-internals
The Internals of Delta Lake
deltalake book internals delta-lake books datalake
183
Real-time-Data-Warehouse
izhangzhihao / Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
flink data-warehouse real-time-data-warehouse data-warehousing flink-sql debezium kafka elasticsearch delta-lake cdc change-data-capture hudi hoodie iceberg sql datalake delta deltalake spark spark-sql
Language:Dockerfile 117
delta-incubator / delta-sharing-rs
A Minimalistic Rust Implementation of Delta Sharing Server.
axum data-engineering delta-io delta-lake rust
Language:Rust 89
anneglienke / 101_upsert-delta
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
delta delta-lake deltalake
Language:Python 86
tikal-fuseday / delta-architecture
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
data-pipeline databases debezium delta-lake kafka spark streams
Language:HTML 75
lhbench / lhbench
Lakehouse storage system benchmark
apache-hudi apache-iceberg lakehouse benchmark cidr database databricks delta-lake
Language:Scala 72
dask-contrib / dask-deltatable
A Delta Lake reader for Dask
dask dask-dataframes delta-lake parquet python
Language:Python 49
neylsoncrepalde / edc-mod1-exercise-igti
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
aws delta-lake emr spark terraform
Language:Python 49
modern-data-lake-storage-layers
dacort / modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
aws amazon-emr hudi iceberg apache-hudi apache-iceberg delta-lake
Language:Jupyter Notebook 47
jeppe742 / DeltaLakeReader
Read Delta tables without any Spark
delta-tables delta-lake
Language:Python 47
ysfesr / Building-Data-LakeHouse
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
delta-lake docker hive lakehouse minio presto s3-storage spark
Language:Python 45
TatevKaren / free-resources-books-papers
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
free-resources free-books machine-learning econometrics mathematics books databricks delta-lake developers data-science statistics
41
delta-go
csimplestring / delta-go
Native Delta Lake Implementation in Go
bigdata databricks dataprocessing delta-lake golang infrastructure spark
Language:Go 40
databrickslabs / delta-oms
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
delta delta-lake metrics centralized lakehouse monitoring databricks
Language:Scala 39
thanhENC / e2e-data-platform
End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore, Minio, Postgres)
adventureworks airflow data-pipeline data-platform dbt delta-lake docker-compose end-to-end hive-metastore lightdash spark trino
Language:Python 34
spark-movies-etl
guidok91 / spark-movies-etl
Spark data pipeline that processes movie ratings data.
apache-airflow apache-iceberg data-engineering data-pipeline elt etl pyspark spark uv
Language:Python 30
jaehyeon-kim / dbt-on-aws
dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats
athena dbt delta-lake emr glue hudi iceberg redshift
Language:HCL 29
AndrewKuzmin / spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.5.1
spark spark-sql spark-structured-streaming apache-spark structured-streaming delta-lake
Language:Scala 26
harrydevforlife / building-lakehouse
Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
airflow dbt delta-lake flask-api hive-metastore lakehouse metabase minio python s3 spark
Language:Python 24
apache / doris-streamloader
Stream Loader for Apache Doris
bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql
Language:Go 23
PFund-Software-Ltd / pfeed
Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store. Supports Data Lakehouse Architecture. Clean Once and Forget.
algo-trading backtesting data-lakehouse data-pipeline data-storage delta-lake historical-data pandas polars streaming
Language:Python 23

delta-lake

apache / doris

trinodb / trino

StarRocks / starrocks

delta-io / delta

roapi / roapi

delta-io / delta-rs

Mooncake-Labs / pg_mooncake

databricks / LearningSparkV2

apache / incubator-xtable

delta-io / delta-sharing

Nike-Inc / koheesio

splitgraph / seafowl

aws-samples / amazon-sagemaker-local-mode

adidas / lakehouse-engine

josephmachado / data_engineering_best_practices

japila-books / delta-lake-internals

izhangzhihao / Real-time-Data-Warehouse

delta-incubator / delta-sharing-rs

anneglienke / 101_upsert-delta

tikal-fuseday / delta-architecture

lhbench / lhbench

dask-contrib / dask-deltatable

neylsoncrepalde / edc-mod1-exercise-igti

dacort / modern-data-lake-storage-layers

jeppe742 / DeltaLakeReader

ysfesr / Building-Data-LakeHouse

TatevKaren / free-resources-books-papers

csimplestring / delta-go

databrickslabs / delta-oms

thanhENC / e2e-data-platform

guidok91 / spark-movies-etl

jaehyeon-kim / dbt-on-aws

AndrewKuzmin / spark-structured-streaming-examples

harrydevforlife / building-lakehouse

apache / doris-streamloader

PFund-Software-Ltd / pfeed