EOF POST predict error

Question

EOF POST predict error

yabo-boye opened this issue 8 months ago · comments

Hello

I installed, with a lot of effort, ollama ready zip for ubuntu on fedora.

I tried to load a model like this:

ollama run dolphin-mistral-24b-gpu:latest
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:

hi
Error: POST predict: Post "http://127.0.0.1:44119/completion": EOF

I get this error.

Please help me to troubleshoot this!

yabo-boye commented 8 months ago

log.txt

yabo-boye · Answer 1 · Fri Mar 14 2025 09:10:43 GMT+0800 (China Standard Time)

Segmentation Fault with Intel Arc GPU (SYCL) Even When GPU Offloading Disabled
Description

I’ve encountered a persistent segmentation fault (SIGSEGV) when running Ollama with the ipex-llm nightly build (20250226) on an Intel Arc GPU (Meteor Lake), using oneAPI 2025.0 on Nobara 41. The crash occurs during inference with the deepseek-r1:1.5b model, specifically at ggml_sycl_rms_norm, even when GPU offloading is disabled with OLLAMA_NUM_GPU=0. This suggests that SYCL backend operations are still being invoked despite the intent to run on CPU only.

Switching to vanilla Ollama (CPU-only) resolves the issue, but I’d like to report this bug to help improve Intel GPU support in ipex-llm-based builds.
Environment

OS: Nobara 41 (Linux)
Hardware: Intel Arc Graphics (Meteor Lake), 30.7 GiB total RAM (24.5 GiB free)
Driver: Intel GPU driver version 24.35.30872.32
Ollama Version: Custom build with ipex-llm nightly (20250226), oneAPI 2025.0
Model: deepseek-r1:1.5b (Q4_K quantization, 28 layers, ~1.04 GiB)
Command: ollama run deepseek-r1:1.5b "hi"

Steps to Reproduce

Install ipex-llm nightly build (20250226) with oneAPI 2025.0 on a system with Intel Arc GPU.
Build Ollama from source with this ipex-llm configuration.
Set OLLAMA_NUM_GPU=0 to disable GPU offloading:
bash

export OLLAMA_NUM_GPU=0
Start the Ollama server and run the model:
bash

./ollama serve > log.txt 2>&1 &
ollama run deepseek-r1:1.5b "hi"
Observe the segmentation fault in the logs.

Expected Behavior

With OLLAMA_NUM_GPU=0, Ollama should run entirely on the CPU, avoiding any GPU-related operations (including SYCL backend calls) and complete inference without crashing.
Actual Behavior

The process crashes with a segmentation fault:
text
call ggml_sycl_rms_norm
SIGSEGV: segmentation violation
PC=0x76ab1c497379 m=0 sigcode=1 addr=0x0

Logs show SYCL0 (Intel Arc Graphics) is still initialized and buffers are allocated on the GPU (e.g., KV buffer: 56.00 MiB, compute buffer: 299.75 MiB), despite no layers being offloaded (offloaded 0/29 layers to GPU).
The crash occurs during llama_decode, suggesting SYCL is invoked for RMS normalization or other operations.

Logs

Attached log snippet:
text
time=2025-03-13T21:55:05.348-03:00 level=INFO source=server.go:610 msg="llama runner started in 1.76 seconds"
call ggml_sycl_rms_norm
SIGSEGV: segmentation violation
PC=0x76ab1c497379 m=0 sigcode=1 addr=0x0
signal arrived during cgo execution
...
llama_load_model_from_file: using device SYCL0 (Intel(R) Arc(TM) Graphics) - 29115 MiB free
llm_load_tensors: offloaded 0/29 layers to GPU
llama_new_context_with_model: SYCL0 KV buffer size = 56.00 MiB
llama_new_context_with_model: SYCL0 compute buffer size = 299.75 MiB

(Full log available if needed.)
Possible Cause

The SYCL backend in the ipex-llm nightly build is still engaging the GPU for certain operations (e.g., KV cache or RMS norm) even when OLLAMA_NUM_GPU=0 disables layer offloading.
Potential bug in SYCL implementation for Intel Arc GPUs, possibly related to driver version 24.35.30872.32 or the Qwen2 model architecture (deepseek-r1:1.5b).
Nightly build instability might exacerbate the issue.

Workaround

Switching to vanilla Ollama (official binary, CPU-only) avoids the crash entirely, as it doesn’t use SYCL or GPU acceleration by default on Linux without CUDA/ROCm.
Rebuilding Ollama with the CPU-only ipex-llm version (excluding SYCL) should also work, though I haven’t tested this yet.

Suggested Fix

Ensure OLLAMA_NUM_GPU=0 fully disables SYCL/GPU usage, not just layer offloading, in ipex-llm-based builds.
Investigate why ggml_sycl_rms_norm is called when no layers are offloaded—possible misconfiguration or bug in the SYCL backend.
Test with stable ipex-llm releases (not nightly) and newer Intel Arc drivers to isolate the issue.

Additional Notes

I’m unwilling to downgrade my kernel (as suggested in the ipex-llm GPU guide), so I’ve opted for vanilla Ollama for now.
Happy to provide more logs or test patches if needed!

kakuSun · Answer 2 · Fri Mar 14 2025 21:02:15 GMT+0800 (China Standard Time)

+1 double A770 has this error