Michael Goin's starred repositories
flash-attention
Fast and memory-efficient exact attention
Liger-Kernel
Efficient Triton Kernels for LLM Training
ThunderKittens
Tile primitives for speedy kernels
llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
TensorRT-Incubator
Experimental projects related to TensorRT
Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk