imneov / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://vllm.readthedocs.io

Geek Repo

Github PK Tool

This repository is not active

About

A high-throughput and memory-efficient inference and serving engine for LLMs

https://vllm.readthedocs.io

Apache License 2.0

Languages

Language:Python 80.4%Language:Cuda 14.0%Language:C++ 3.9%Language:CMake 0.8%Language:Shell 0.5%Language:Dockerfile 0.2%Language:C 0.1%Language:Jinja 0.1%