Yann Byron's starred repositories

lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

Language:RustLicense:Apache-2.0Stargazers:3649Issues:0Issues:0

feldera

Feldera Continuous Analytics Platform

Language:RustLicense:NOASSERTIONStargazers:324Issues:0Issues:0

datafusion

Apache DataFusion SQL Query Engine

Language:RustLicense:Apache-2.0Stargazers:5652Issues:0Issues:0

gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

Language:JavaLicense:Apache-2.0Stargazers:764Issues:0Issues:0

delta-lake-internals

The Internals of Delta Lake

License:Apache-2.0Stargazers:179Issues:0Issues:0

spark-sql-internals

The Internals of Spark SQL

License:Apache-2.0Stargazers:447Issues:0Issues:0

duckdb

DuckDB is an analytical in-process SQL database management system

Language:C++License:MITStargazers:21185Issues:0Issues:0

iceberg

Apache Iceberg

Language:JavaLicense:Apache-2.0Stargazers:5962Issues:0Issues:0
Language:JavaLicense:Apache-2.0Stargazers:75Issues:0Issues:0

incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Language:ScalaLicense:Apache-2.0Stargazers:1086Issues:0Issues:0

velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

License:Apache-2.0Stargazers:1Issues:0Issues:0

carbondata

High performance data store solution

Language:ScalaLicense:Apache-2.0Stargazers:1427Issues:0Issues:0

seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

Language:JavaLicense:Apache-2.0Stargazers:7663Issues:0Issues:0

celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Language:JavaLicense:Apache-2.0Stargazers:831Issues:0Issues:0

velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Language:C++License:Apache-2.0Stargazers:3318Issues:0Issues:0

incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

Language:JavaLicense:Apache-2.0Stargazers:363Issues:0Issues:0

incubator-toree

Mirror of Apache Toree (Incubating)

Language:ScalaLicense:Apache-2.0Stargazers:736Issues:0Issues:0
Language:JavaLicense:Apache-2.0Stargazers:13Issues:0Issues:0

nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Language:JavaLicense:Apache-2.0Stargazers:929Issues:0Issues:0

Firestorm

Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers

Language:JavaLicense:NOASSERTIONStargazers:249Issues:0Issues:0

coder2gwy

互联网首份程序员考公指南,由3位已经进入体制内的前大厂程序员联合献上。

Stargazers:25791Issues:0Issues:0

CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Language:ScalaStargazers:3458Issues:0Issues:0