Qingquan Song's starred repositories
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Megatron-LM
Ongoing research training transformer models at scale
lm-evaluation-harness
A framework for few-shot evaluation of language models.
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
ThunderKittens
Tile primitives for speedy kernels
alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
lovely-tensors
Tensors, for human consumption
resource-stream
CUDA related news and material links
generative-recommenders
Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
ring-flash-attention
Ring attention implementation with flash attention
NeMo-Aligner
Scalable toolkit for efficient model alignment
Awesome-Generative-RecSys
A curated list of Generative Recommender Systems (Paper & Code)
optimizers
For optimization algorithm research and development.
triton-index
Cataloging released Triton kernels.