ZZK's repositories
flashinfer
FlashInfer: Kernel Library for LLM Serving
cutlass_master
CUDA Templates for Linear Algebra Subroutines
APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
auto-round
SOTA Weight-only Quantization Algorithm for LLMs
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
cccl
CUDA C++ Core Libraries
cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
EETQ
Easy and Efficient Quantization for Transformers
float8_experimental
This repository contains the experimental PyTorch native float8 training UX
fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
gemma_pytorch
The official PyTorch implementation of Google's Gemma models
GPUSorting
OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
KIVI
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
open-gpu-kernel-modules
NVIDIA Linux open GPU with P2P support
qllm-eval
Code Repository of Evaluating Quantized Large Language Models
quanto
A pytorch Quantization Toolkit
QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
triton
Development repository for the Triton language and compiler
Triton-Puzzles
Puzzles for learning Triton