cchenax's repositories
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ambry
Distributed object store
cassandra
Mirror of Apache Cassandra
ClickHouse
ClickHouse® is a free analytics DBMS for big data
cockroach
CockroachDB - the open source, cloud-native distributed SQL database.
Copysets
CS 244 Reproduction of Copysets
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
ecwide
USENIX FAST 2021, "Exploiting Combined Locality for Wide-Stripe Erasure Coding in Distributed Storage"
educative.io_courses
this is downloadings of all educative.io free student subscription courses as pdf from GitHub student pack
flink
Apache Flink
Grokking-the-System-Design
Grokking the system design interview course materials
hadoop-20
Facebook's Realtime Distributed FS based on Apache Hadoop 0.20-append
hops
Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.
incubator-ratis
Open source Java implementation for Raft consensus protocol.
litdata
Transform datasets at scale. Optimize datasets for fast AI model training.
milvus
A cloud-native vector database, storage for next generation AI applications
minio
The Object Store for AI Data Infrastructure
pinot
Apache Pinot - A realtime distributed OLAP datastore
raft-zh_cn
Raft一致性算法论文的中文翻译
redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
repairboost-code
This is the implementation of RepairBoost described in our paper "Boosting Full-Node Repair in Erasure-Coded Storage" appeared in USENIX ATC'21.
seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
system-design
Learn how to design systems at scale and prepare for system design interviews
system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
task-scheduler
A fault tolerant distributed task scheduler simulation
trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)