why i use vllm inference deepseek v2 ,speed is low
ZzzybEric opened this issue · comments
i use vllm to inference deepspeed, use flask to deploy model. When the problem enters the model, it always gets stuck for a long time in the processd prompt step,the code i use is your example code
whats your gpu type?