zhouyuan / llm-continuous-batching-benchmarks

The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.

cd benchmark_config bash vllm_variable_size_latency

About

Languages

Language:Python 72.3%Language:Shell 27.7%