在GPU上运行无法达到最佳的tokens

Question

cyskdlx opened this issue 8 months ago · comments

机器配置：14代Core i9；
内存：64GB；
显卡：双intelA770 ，16G显存x2；
SSD：500GB

问题描述：单一账户做交互的时候，GPU处理数据，最大的tokens只有15/S左右，日志如下：

biyuehuang · Answer 1 · Wed Mar 05 2025 11:22:30 GMT+0800 (China Standard Time)

vllm在两张A770 16G独显运行deepseek-r1-distill-qwen-32b，单一账户做交互推理的时候，理论上tokens有30token/S左右

Dongjie Shi · Answer 2 · Wed Mar 05 2025 13:28:59 GMT+0800 (China Standard Time)

Which Docker image are you using?

biyuehuang · Answer 3 · Wed Mar 05 2025 15:58:07 GMT+0800 (China Standard Time)

Closed
在镜像外面运行
sudo xpu-smi config -d x -t 0 --frequencyrange 2400,2400
然后再进入镜像运行模型推理

Shaojun Liu · Answer 4 · Fri Mar 07 2025 10:08:14 GMT+0800 (China Standard Time)

Thanks~

cyskdlx · Answer 5 · Fri Mar 07 2025 20:46:01 GMT+0800 (China Standard Time)

Thanks发自我的手机-------- 原始邮件 --------发件人： Shaojun Liu ***@***.***>日期： 2025年3月7日周五 10:08收件人： intel/ipex-llm ***@***.***>抄送： cyskdlx ***@***.***>, State change ***@***.***>主题： Re: [intel/ipex-llm] 在GPU上运行无法达到最佳的tokens (Issue #12933) Thanks~ We added CPU and GPU Frequency Locking Instructions into QuickStart: https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#running-vllm-serving-with-ipex-llm-on-intel-gpu-in-docker—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: ***@***.***> liu-shaojun left a comment (intel/ipex-llm#12933) Thanks~ We added CPU and GPU Frequency Locking Instructions into QuickStart: https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md#running-vllm-serving-with-ipex-llm-on-intel-gpu-in-docker —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: ***@***.***>