Yann Byron's repositories
incubator-paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
datafusion
Apache DataFusion SQL Query Engine
datafusion-comet
Apache DataFusion Comet Spark Accelerator
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
hudi
Upserts, Deletes And Incremental Processing on Big Data.
spark
Mirror of Apache Spark
delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
incubator-celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
arctic
Arctic is a streaming lake warehouse service open sourced by NetEase
connectors
Connectors for Delta Lake
incubator-toree
Mirror of Apache Toree (Incubating)
hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
coder2gwy
互联网首份程序员考公指南,由3位已经进入体制内的前大厂程序员联合献上。
flink
Apache Flink
flink-learning
flink learning blog. http://www.54tianzhisheng.cn 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
simple-rpc
A simple rpc framework.
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
dr-elephant
Performance monitoring and tuning tool for Apache Hadoop
git
Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements.
prog-scala-2nd-ed-code-examples
The code examples used in Programming Scala, 2nd Edition (O'Reilly)
CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
hbase-rdd
Spark RDD to read and write from HBase
framework
Lift Framework
scikit-learn
scikit-learn: machine learning in Python
MyNotes
Self-written notes that may be useful
data-algorithms-book
MapReduce and Spark Source Code and Scripts for Data Algorithms Book