Rui Wang's repositories
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
atlas
Mirror of Apache Atlas
awesome-streaming
a curated list of awesome streaming frameworks, applications, etc
bahir-flink
Mirror of Apache Bahir Flink
db-readings
Readings in Databases
DeepLearning-500-questions
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为15个章节,近20万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
DistributedSystem-Series
:books: 深入浅出分布式基础架构,Linux 与操作系统篇 | 分布式系统篇 | 分布式计算篇 | 数据库篇 | 网络篇 | 虚拟化与编排篇 | 大数据与云计算篇
drill
Apache Drill
flinkStreamSQL
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join
gandiva
Vectorized processing for Apache Arrow
hive
Mirror of Apache Hive
incubator-gearpump
Mirror of Apache Gearpump (Incubating)
incubator-gossip
Mirror of Apache Gossip Incubator
incubator-iceberg
Apache Iceberg (Incubating)
incubator-shardingsphere
Distributed database middleware
kudu
Mirror of Apache Kudu
linux-insides
A little bit about a linux kernel
logcabin
LogCabin is a distributed storage system built on Raft that provides a small amount of highly replicated, consistent storage. It is a reliable place for other distributed systems to store their core metadata and is helpful in solving cluster management issues.
ml-notes
Personal notes on ML :page_with_curl: :honeybee:
orc
Mirror of Apache Orc
parquet-mr
Apache Parquet
pulsar
Apache Pulsar - distributed pub-sub messaging system
ratis
A java implementation of Raft protocol for Hadoop ecosystem
ray
A high-performance distributed execution engine
Recommenders
Recommender Systems
SparkRDMA
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
TPC-H-Hive
Running TPC-H on Apache Hive
tsdb
The Prometheus time series database layer.
xlearn
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.