Yann Byron's repositories
arctic
Arctic is a streaming lake warehouse service open sourced by NetEase
coder2gwy
互联网首份程序员考公指南,由3位已经进入体制内的前大厂程序员联合献上。
connectors
Connectors for Delta Lake
CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
data-algorithms-book
MapReduce and Spark Source Code and Scripts for Data Algorithms Book
delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
dr-elephant
Performance monitoring and tuning tool for Apache Hadoop
flink
Apache Flink
flink-learning
flink learning blog. http://www.54tianzhisheng.cn 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
framework
Lift Framework
git
Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements.
hbase-rdd
Spark RDD to read and write from HBase
hudi
Upserts, Deletes And Incremental Processing on Big Data.
hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
incubator-celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
incubator-paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
incubator-toree
Mirror of Apache Toree (Incubating)
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
mcsapi_python
美团云 API Python SDK和客户端
MyNotes
Self-written notes that may be useful
prog-scala-2nd-ed-code-examples
The code examples used in Programming Scala, 2nd Edition (O'Reilly)
scala
The Scala programming language
scikit-learn
scikit-learn: machine learning in Python
simple-rpc
A simple rpc framework.
spark
Mirror of Apache Spark
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow