why i use vllm inference deepseek v2 ,speed is low

Question

why i use vllm inference deepseek v2 ,speed is low

ZzzybEric opened this issue 2 months ago · comments

i use vllm to inference deepspeed, use flask to deploy model. When the problem enters the model, it always gets stuck for a long time in the processd prompt step，the code i use is your example code

Fuli Luo · Answer 1 · Mon May 27 2024 19:16:02 GMT+0800 (China Standard Time)

https://huggingface.co/deepseek-ai/DeepSeek-V2/discussions/1
@ZzzybEric

ran130683 · Answer 2 · Thu Jun 20 2024 17:52:19 GMT+0800 (China Standard Time)

whats your gpu type？