Uranus (UranusSeven)

UranusSeven

Geek Repo

Company:Xprobe

Github PK Tool:Github PK Tool

Uranus's starred repositories

MagicDec

Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Language:JavaScriptLicense:Apache-2.0Stargazers:62Issues:0Issues:0

vidur

A large-scale simulation framework for LLM inference

Language:PythonLicense:MITStargazers:250Issues:0Issues:0

Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Language:CudaLicense:Apache-2.0Stargazers:576Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:6106Issues:0Issues:0

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:3151Issues:0Issues:0

ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Language:PythonLicense:Apache-2.0Stargazers:694Issues:0Issues:0

mem0

The Memory layer for your AI apps

Language:PythonLicense:Apache-2.0Stargazers:22213Issues:0Issues:0

vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

Language:CLicense:MITStargazers:195Issues:0Issues:0

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1525Issues:0Issues:0

compressed-tensors

A safetensors extension to efficiently store sparse quantized tensors on disk

Language:PythonLicense:Apache-2.0Stargazers:34Issues:0Issues:0

ttt-lm-jax

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Language:PythonStargazers:350Issues:0Issues:0

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Language:PythonLicense:MITStargazers:285Issues:0Issues:0

lectures

Material for gpu-mode lectures

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2648Issues:0Issues:0

MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Language:PythonLicense:MITStargazers:726Issues:0Issues:0

ringattention

Transformers with Arbitrarily Large Context

Language:PythonLicense:Apache-2.0Stargazers:622Issues:0Issues:0

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:201Issues:0Issues:0

SwiftTransformer

High performance Transformer implementation in C++.

Language:C++Stargazers:69Issues:0Issues:0

AI-Software-Startups

A Survey of AI startups

License:MITStargazers:392Issues:0Issues:0

onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1618Issues:0Issues:0

OpenHands

🙌 OpenHands: Code Less, Make More

Language:PythonLicense:MITStargazers:32815Issues:0Issues:0

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

Stargazers:1072Issues:0Issues:0

mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

Language:C++License:MITStargazers:235Issues:0Issues:0

Odysseus-Transformer

Odysseus: Playground of LLM Sequence Parallelism

Language:PythonStargazers:50Issues:0Issues:0

libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

Language:C++License:BSD-3-ClauseStargazers:96Issues:0Issues:0

flux

A fast communication-overlapping library for tensor parallelism on GPUs.

Language:C++License:Apache-2.0Stargazers:200Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2608Issues:0Issues:0
Language:PythonStargazers:847Issues:0Issues:0

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++License:Apache-2.0Stargazers:278Issues:0Issues:0

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonLicense:MITStargazers:220Issues:0Issues:0