Zhang Cao's starred repositories
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
ThunderKittens
Tile primitives for speedy kernels
nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
prophet-rocksdb
[MSST '24] Prophet: Optimizing LSM-Based Key-Value Store on ZNS SSDs with File Lifetime Prediction and Compaction Compensation.
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks