Sheng Qin's starred repositories

ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Language:PythonLicense:Apache-2.0Stargazers:252Issues:0Issues:0

flux

Official inference repo for FLUX.1 models

Language:PythonLicense:Apache-2.0Stargazers:5211Issues:0Issues:0
Language:PythonLicense:MITStargazers:552Issues:0Issues:0

evalscope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Language:PythonLicense:Apache-2.0Stargazers:133Issues:0Issues:0

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Language:PythonLicense:MITStargazers:245Issues:0Issues:0

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Language:PythonLicense:Apache-2.0Stargazers:3963Issues:0Issues:0

LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Language:Jupyter NotebookStargazers:63Issues:0Issues:0

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonLicense:Apache-2.0Stargazers:2107Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:4206Issues:0Issues:0

GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Language:PythonLicense:Apache-2.0Stargazers:2962Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:117Issues:0Issues:0

bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Language:PythonLicense:MITStargazers:5859Issues:0Issues:0

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonLicense:MITStargazers:1533Issues:0Issues:0

llm-numbers

Numbers every LLM developer should know

Stargazers:4023Issues:0Issues:0

composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

Language:C++License:NOASSERTIONStargazers:268Issues:0Issues:0

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1433Issues:0Issues:0

xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Language:PythonLicense:Apache-2.0Stargazers:198Issues:0Issues:0

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

Stargazers:959Issues:0Issues:0

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:PythonStargazers:246Issues:0Issues:0

minisora

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Language:PythonLicense:Apache-2.0Stargazers:1135Issues:0Issues:0

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonLicense:Apache-2.0Stargazers:1563Issues:0Issues:0

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:11088Issues:0Issues:0

PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Language:PythonLicense:AGPL-3.0Stargazers:1534Issues:0Issues:0

QPyTorch

Low Precision Arithmetic Simulation in PyTorch

Language:PythonLicense:MITStargazers:258Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:88Issues:0Issues:0

brevitas

Brevitas: neural network quantization in PyTorch

Language:PythonLicense:NOASSERTIONStargazers:1133Issues:0Issues:0

fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Language:CudaLicense:Apache-2.0Stargazers:164Issues:0Issues:0

cccl

CUDA Core Compute Libraries

Language:C++License:NOASSERTIONStargazers:1046Issues:0Issues:0

Zelda64Recomp

Static recompilation of Majora's Mask (and soon Ocarina of Time) for PC (Windows/Linux)

Language:CLicense:GPL-3.0Stargazers:5148Issues:0Issues:0

N64Recomp

Tool to statically recompile N64 games into native executables

Language:C++License:MITStargazers:6214Issues:0Issues:0