ninehills / llm-inference-benchmark

LLM Inference benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why is the inference FTL@1 longer after the vllm framework is quantized?

luhairong11 opened this issue · comments

vLLM has already fixed this issue.

I will retest soon.

@ninehills
Is there any update on this? Or could you tell me in which version of vLLM this issue was resolved?

In version vLLM-0.4.3, my tests show that the quantized version's TTFT is still lower than the non-quantized version