Beast code in Giters

vLLM's repositories

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.062661 452 11830

aibrix

Cost-efficient and pluggable Infrastructure components for GenAI inference

Language:GoApache-2.04366 46 764

llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Language:PythonApache-2.02214 24 454

semantic-router

Intelligent Router for Mixture-of-Models

Language:RustApache-2.02206 46 208

production-stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Language:PythonApache-2.01926 25 228

vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

Language:PythonApache-2.01335 16 998

guidellm

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Language:PythonApache-2.0690 17 155

recipes

Common recipes to run vLLM

Language:Jupyter NotebookApache-2.0216 1 7

compressed-tensors

A safetensors extension to efficiently store sparse quantized tensors on disk

Language:PythonApache-2.0199 12 35

tpu-inference

TPU inference for vLLM, with unified JAX and PyTorch support.

Language:PythonApache-2.015800

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause97 30

speculators

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Language:PythonApache-2.064 6 52

dashboard

vLLM performance dashboard

Language:PythonApache-2.037 10

vllm-spyre

Community maintained hardware plugin for vLLM on Spyre

Language:PythonApache-2.037 8 27

ci-infra

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

Language:HCLApache-2.02600

vllm-openvino

Language:PythonApache-2.025 1 10

vllm-project.github.io

Language:JavaScriptMIT23 150

vllm-nccl

Manages vllm-nccl dependency

Language:PythonApache-2.017 1 3

vllm-gaudi

Community maintained hardware plugin for vLLM on Intel Gaudi

Language:PythonApache-2.01500

vllm-neuron

Community maintained hardware plugin for vLLM on AWS Neuron

Language:PythonApache-2.01100

vllm-xpu-kernels

The vLLM XPU kernels for Intel GPU

Language:C++Apache-2.011 30

vllm-project.github.io-static

Language:HTMLMIT8 50

FlashMLA

Language:C++MIT7 10

media-kit

vLLM Logo Assets

6 20

rfcs

1 10

DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

MIT000