Ollama Linux No Response Issue with IPEX-LLM
RobinJing opened this issue · comments
OS: Linux Ubuntu 22.04
Kernel:5.13
显卡:A770
平台:RPL-P
在按照guide安装并启动ollama后,出现query没反应的情况,ollama侧也没有任何的打印。
| | | |Compute |Max compute|Max work|Max sub| |
|ID| Device Type| Name|capability|units |group |group |Global mem size|
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
ggml_backend_sycl_set_mul_device_mode: true
detect 1 SYCL GPUs: [0] with top Max compute units:512
llm_load_tensors: ggml ctx size = 0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: SYCL0 buffer size = 3577.56 MiB
llm_load_tensors: CPU buffer size = 70.31 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: SYCL0 KV buffer size = 1024.00 MiB
llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
llama_new_context_with_model: SYCL_Host output buffer size = 0.14 MiB
llama_new_context_with_model: SYCL0 compute buffer size = 180.00 MiB
llama_new_context_with_model: SYCL_Host compute buffer size = 12.01 MiB
llama_new_context_with_model: graph nodes = 1062
llama_new_context_with_model: graph splits = 2
The issue has been reproduced, and we are working on resolving it.
Before starting ollama server, please set the environment config as below:
export LD_LIBRARY_PATH=/opt/intel/oneapi/mkl/your_oneapi_version/lib:/opt/intel/oneapi/compiler/your_oneapi_version/lib