intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

Performance issue @ MTL 155h 16G DDR5

Waying13 opened this issue · comments

ds benchmark.xlsx
I have followed the guideline run benchmark for ds-distill-Qwen7b and 1.5b, but the results data seems a little bit incorrect (especially for tps and peak memory), could you help to check about it, please? Thanks.

Have done with the following inspect:
PS C:\Windows\system32> winsat mem
Windows 系统评估工具

正在运行: 功能枚举 ''
运行时间 00:00:00.00
正在运行: 系统内存性能评估 ''
运行时间 00:00:06.44
内存性能 46357.78 MB/s
Dshow 视频编码时间 0.00000 s
Dshow 视频解码时间 0.00000 s
媒体基础解码时间 0.00000 s
总运行时间 00:00:07.28

The memory bandwidth is almost haft than xiaoxin U5 125H. So the TPS result is also not so good.
TPS speed mainly has relationship with bandwidth and frequency of the system memory.