Maozhou Ge's starred repositories
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
ThunderKittens
Tile primitives for speedy kernels
torchtitan
A native PyTorch Library for large model training
Skywork
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
modern-latex
A short guide to LaTeX that avoids legacy cruft.
ml-systems-papers
Curated collection of papers in machine learning systems
grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.