The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.
cd benchmark_config bash vllm_variable_size_latency
The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.
cd benchmark_config bash vllm_variable_size_latency