Qingquan Song's starred repositories

unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:16642Issues:117Issues:875

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonLicense:Apache-2.0Stargazers:12493Issues:133Issues:210

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:11309Issues:160Issues:303

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:10210Issues:162Issues:735

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonLicense:MITStargazers:6644Issues:37Issues:1097

bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Language:PythonLicense:MITStargazers:5756Issues:48Issues:968

torchtune

PyTorch native finetuning library

Language:PythonLicense:BSD-3-ClauseStargazers:4101Issues:46Issues:593

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2487Issues:35Issues:7

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1864Issues:34Issues:325

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1716Issues:21Issues:66

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1525Issues:25Issues:26

alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1466Issues:7Issues:142

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonLicense:Apache-2.0Stargazers:1384Issues:18Issues:52

ao

PyTorch native quantization and sparsity for training and inference

Language:PythonLicense:BSD-3-ClauseStargazers:1321Issues:40Issues:232

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonLicense:MITStargazers:1252Issues:27Issues:44

lovely-tensors

Tensors, for human consumption

Language:Jupyter NotebookLicense:MITStargazers:1104Issues:10Issues:22

resource-stream

CUDA related news and material links

LOMO

LOMO: LOw-Memory Optimization

Language:PythonLicense:MITStargazers:976Issues:13Issues:70

generative-recommenders

Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Language:PythonLicense:Apache-2.0Stargazers:676Issues:24Issues:44

hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language:PythonLicense:Apache-2.0Stargazers:674Issues:16Issues:96

llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language:PythonLicense:Apache-2.0Stargazers:548Issues:12Issues:62

ring-flash-attention

Ring attention implementation with flash attention

Language:PythonLicense:MITStargazers:548Issues:10Issues:32

NeMo-Aligner

Scalable toolkit for efficient model alignment

Language:PythonLicense:Apache-2.0Stargazers:542Issues:16Issues:71

orpo

Official repository for ORPO

Language:PythonLicense:Apache-2.0Stargazers:414Issues:6Issues:27

Awesome-Generative-RecSys

A curated list of Generative Recommender Systems (Paper & Code)

Adam-mini

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793

optimizers

For optimization algorithm research and development.

Language:PythonLicense:NOASSERTIONStargazers:255Issues:14Issues:12

triton-index

Cataloging released Triton kernels.

License:Apache-2.0Stargazers:113Issues:4Issues:0

QuantEase

QuantEase, a layer-wise quantization framework, frames the problem as discrete-structured non-convex optimization. Our work leverages Coordinate Descent techniques, offering high-quality solutions without the need for matrix inversion or decomposition.

Language:PythonLicense:BSD-2-ClauseStargazers:17Issues:7Issues:0