intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ollama Linux No Response Issue with IPEX-LLM

RobinJing opened this issue · comments

OS: Linux Ubuntu 22.04
Kernel:5.13
显卡:A770
平台:RPL-P
在按照guide安装并启动ollama后,出现query没反应的情况,ollama侧也没有任何的打印。
| | | |Compute |Max compute|Max work|Max sub| |
|ID| Device Type| Name|capability|units |group |group |Global mem size|
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:512
llm_load_tensors: ggml ctx size = 0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: SYCL0 buffer size = 3577.56 MiB
llm_load_tensors: CPU buffer size = 70.31 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: SYCL0 KV buffer size = 1024.00 MiB
llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
llama_new_context_with_model: SYCL_Host output buffer size = 0.14 MiB
llama_new_context_with_model: SYCL0 compute buffer size = 180.00 MiB
llama_new_context_with_model: SYCL_Host compute buffer size = 12.01 MiB
llama_new_context_with_model: graph nodes = 1062
llama_new_context_with_model: graph splits = 2

The issue has been reproduced, and we are working on resolving it.

Before starting ollama server, please set the environment config as below:

export LD_LIBRARY_PATH=/opt/intel/oneapi/mkl/your_oneapi_version/lib:/opt/intel/oneapi/compiler/your_oneapi_version/lib