About
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
Languages
Language:Python 81.0%Language:Cuda 14.9%Language:C++ 2.9%Language:CMake 0.6%Language:Shell 0.4%Language:Dockerfile 0.1%Language:C 0.1%Language:Jinja 0.1%