cmu-catalyst / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Home Page:https://docs.vllm.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository is not active

About

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

License:Apache License 2.0


Languages

Language:Python 79.9%Language:Cuda 18.6%Language:Shell 0.6%Language:C 0.4%Language:C++ 0.3%Language:Dockerfile 0.2%