intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there a way to run ollama with IPEX-LLM on CPU

reeingal opened this issue · comments

I want to run ollama with IPEM-LLM on a machine with 4 Intel Xeon CPU E7-4830 v3 processors and 256GB of memory. The operating system is Ubuntu 24.04. I followed the steps in the official tutorial as follows:

  1. Install Intel® oneAPI Base Toolkit:
    apt-get install intel-basekit
    apt-get install intel-hpckit
    
  2. Install ipex-llm-cpp:
    pip install --pre --upgrade ipex-llm[cpp]
    
  3. Execute init-ollama:
    init-ollama
    
  4. Run ollama:
    source /opt/intel/oneapi/setvars.sh
    ollama serve
    
  5. Pull the model:
    ollama pull qwen:7b-chat
    
  6. Chat with ollama through open-webui.

But when I selected the model on open-webui and sent the question, I received a response with error code 500.

I checked the console, and the last output was as follows:

[SYCL] call ggml_init_sycl
ggml_init_sycl: GGML_SYCL_DEBUG: 0
ggml_init_sycl: GGML_SYCL_F16: no
found 2 SYCL devices:
|  |                  |                                             |       |Max compute|Max work|Max sub|               |                                  |
|ID|       Device Type|                                         Name|Version|units      |group   |group  |Global mem size|                    Driver version|
|--|------------------|---------------------------------------------|-------|-----------|--------|-------|---------------|----------------------------------|
| 0|    [opencl:cpu:0]|          Intel Xeon CPU E7-4830 v3 ® 2.10GHz|    3.0|         96|    8192|     64|        270317M|2024.17.5.0.08_160000.xmain-hotfix|
| 1|    [opencl:acc:0]|                  Intel FPGA Emulation Device|    1.2|         96|67108864|     64|        270317M|2024.17.5.0.08_160000.xmain-hotfix|
ggml_backend_sycl_set_mul_device_mode: true
llama_model_load: error loading model: DeviceList is empty. -30 (PI_ERROR_INVALID_VALUE)
llama_load_model_from_file: exception loading model
terminate called after throwing an instance of 'sycl::_V1::invalid_paramter_error'
  what(): DeviceList is empty. -30 (PI_ERROR_INVALID_VALUE)

I couldn't find a method to run ollama with IPEX-LLM on a CPU in the official documentation. I hope someone can point out the problem for me.

Hi @reeingal, ollama with IPEX-LLM does not support running on a pure CPU platform, as we haven't optimized ollama for CPU. You may switch to a GPU device to enable IPEX-LLM optimizations.