Sheng Qin's starred repositories
ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
llm-numbers
Numbers every LLM developer should know
composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
ThunderKittens
Tile primitives for speedy kernels
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Zelda64Recomp
Static recompilation of Majora's Mask (and soon Ocarina of Time) for PC (Windows/Linux)