UranusSeven

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Language:PythonMIT28500

lectures

Material for gpu-mode lectures

Language:Jupyter NotebookApache-2.0264800

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonMIT72600

ringattention

Transformers with Arbitrarily Large Context

Language:PythonApache-2.062200

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Language:PythonApache-2.020100

SwiftTransformer

High performance Transformer implementation in C++.

Language:C++6900

AI-Software-Startups

A Survey of AI startups

MIT39200

onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Language:Jupyter NotebookApache-2.0161800

OpenHands

🙌 OpenHands: Code Less, Make More

Language:PythonMIT3281500

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

107200

mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

Language:C++MIT23500

Odysseus-Transformer

Odysseus: Playground of LLM Sequence Parallelism

Language:Python5000

libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

Language:C++BSD-3-Clause9600

flux

A fast communication-overlapping library for tensor parallelism on GPUs.

Language:C++Apache-2.020000

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0260800

LLM-Agent-Survey

250600

MAP-NEO

Language:Python84700

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++Apache-2.027800

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonMIT22000

UranusSeven

Uranus's starred repositories

MagicDec

vidur

Nanoflow

moshi

Liger-Kernel

ktransformers

mem0

vattention

ThunderKittens

compressed-tensors

ttt-lm-jax

LLM-Viewer