在GPU上运行无法达到最佳的tokens
cyskdlx opened this issue · comments
机器配置:14代Core i9;
内存:64GB;
显卡:双intelA770 ,16G显存x2;
SSD:500GB
问题描述:单一账户做交互的时候,GPU处理数据,最大的tokens只有15/S左右,日志如下:
vllm在两张A770 16G独显 运行deepseek-r1-distill-qwen-32b,单一账户做交互推理的时候,理论上tokens有30token/S左右
Which Docker image are you using?
Closed
在镜像外面运行
sudo xpu-smi config -d x -t 0 --frequencyrange 2400,2400
然后再进入镜像运行模型推理
Thanks~
We added CPU and GPU Frequency Locking Instructions into QuickStart: https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#running-vllm-serving-with-ipex-llm-on-intel-gpu-in-docker