Yu Zhang's starred repositories
flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Rewrite-the-Stars
[CVPR 2024] Rewrite the Stars
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
Counting-Stars
Counting-Stars (★)
flash_attn_jax
JAX bindings for Flash Attention v2
gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
GORU-tensorflow
Gated Orthogonal Recurrent Unit implementation in tensorflow
ParallelTokenizer
Use the tokenizer in parallel to achieve superior acceleration
based-evaluation-harness
A framework for few-shot evaluation of language models.
LLMTest_NeedleInAHaystack_HFModel
Support huggingface model to do simple retrieval from LLM models at various context lengths to measure accuracy