intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

cpu memory increased same size with GPU when run ollama with ipex_llm? is this expected?

sunyq1995 opened this issue · comments

Hi, I find the CPU memory increased same size with GPU when run ollama with ipex_llm, is it expected?
I've set: set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1,
set OLLAMA_INTEL_GPU=9999

version:ollama-ipex-llm-2.2.0b20250313-win

Are you using a dGPU or iGPU?

Are you using a dGPU or iGPU?

dGPU, got it, thanks!!