intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

Unable to run gemma3

lien-dkseo opened this issue · comments

in ollama-ipex-llm-2.2.0b20250318-win
gemma3:1b can run. but, gemma3:4b cannot run.

Image

ollama-ipex-llm-2.2.0b20250318-win>ollama -v
ggml_sycl_init: found 1 SYCL devices:
ollama version is 0.5.4-ipexllm-20250318

ollama-ipex-llm-2.2.0b20250318-win>ollama.exe list
ggml_sycl_init: found 1 SYCL devices:
NAME                                                          ID              SIZE      MODIFIED
modelscope.cn/lmstudio-community/gemma-3-1b-it-GGUF:Q4_K_M    d71f1feffed0    806 MB    28 minutes ago
modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M    83a045ecaea4    3.3 GB    45 minutes ago
dolphin-phi:latest                                            c5761fc77240    1.6 GB    46 minutes ago

Windows 11 Home 24H2
32GB RAM

ollama-ipex-llm-2.2.0b20250318-win>start-ollama.bat
ggml_sycl_init: found 1 SYCL devices:
2025/04/02 16:04:14 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY:localhost,127.0.0.1 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\test\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-04-02T16:04:14.297+09:00 level=INFO source=images.go:757 msg="total blobs: 14"
time=2025-04-02T16:04:14.298+09:00 level=INFO source=images.go:764 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET    /                         --> ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /api/tags                 --> ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/version              --> ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /                         --> ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/version              --> ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2025-04-02T16:04:14.299+09:00 level=INFO source=routes.go:1310 msg="Listening on 127.0.0.1:11434 (version 0.5.4-ipexllm-20250318)"
time=2025-04-02T16:04:14.299+09:00 level=INFO source=routes.go:1339 msg="Dynamic LLM libraries" runners=[ipex_llm]

and...

ollama-ipex-llm-2.2.0b20250318-win>set IPEX_LLM_MODEL_SOURCE=modelscope
ollama-ipex-llm-2.2.0b20250318-win>ollama.exe run gemma3:1b
ggml_sycl_init: found 1 SYCL devices:
>>> Send a message (/? for help)
[GIN] 2025/04/02 - 16:17:52 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2025/04/02 - 16:17:52 | 200 |     21.4884ms |       127.0.0.1 | POST     "/api/show"
time=2025-04-02T16:17:52.716+09:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-04-02T16:17:52.716+09:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-04-02T16:17:52.716+09:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1
time=2025-04-02T16:17:52.716+09:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=16 efficiency=10 threads=22
time=2025-04-02T16:17:52.762+09:00 level=INFO source=server.go:104 msg="system memory" total="31.6 GiB" free="18.6 GiB" free_swap="18.7 GiB"
time=2025-04-02T16:17:52.762+09:00 level=INFO source=memory.go:356 msg="offload to device" layers.requested=-1 layers.model=27 layers.offload=0 layers.split="" memory.available="[18.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="869.7 MiB" memory.required.partial="0 B" memory.required.kv="52.0 MiB" memory.required.allocations="[869.7 MiB]" memory.weights.total="508.5 MiB" memory.weights.repeating="202.5 MiB" memory.weights.nonrepeating="306.0 MiB" memory.graph.full="34.7 MiB" memory.graph.partial="34.7 MiB"
time=2025-04-02T16:17:52.770+09:00 level=INFO source=server.go:392 msg="starting llama server" cmd="D:\\dev\\lien\\2025\\2025-rag-poc\\ollama-ipex-llm-2.2.0b20250318-win\\ollama-lib.exe runner --model C:\\Users\\test\\.ollama\\models\\blobs\\sha256-8d78d9d059a7605c401c105e169e0b08e9f0edc603ceb842f1c4bbb834296d17 --ctx-size 2048 --batch-size 512 --n-gpu-layers 999 --threads 6 --no-mmap --parallel 1 --port 52601"
time=2025-04-02T16:17:52.774+09:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-04-02T16:17:52.774+09:00 level=INFO source=server.go:571 msg="waiting for llama runner to start responding"
time=2025-04-02T16:17:52.774+09:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server error"
ggml_sycl_init: found 1 SYCL devices:
time=2025-04-02T16:17:52.947+09:00 level=INFO source=runner.go:967 msg="starting go runner"
time=2025-04-02T16:17:52.953+09:00 level=INFO source=runner.go:968 msg=system info="CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | cgo(clang)" threads=6
time=2025-04-02T16:17:52.954+09:00 level=INFO source=runner.go:1026 msg="Server listening on 127.0.0.1:52601"
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llama_load_model_from_file: using device SYCL0 (Intel(R) Arc(TM) Graphics) - 16882 MiB free
llama_model_loader: loaded meta data with 38 key-value pairs and 340 tensors from C:\Users\test\.ollama\models\blobs\sha256-8d78d9d059a7605c401c105e169e0b08e9f0edc603ceb842f1c4bbb834296d17 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Gemma 3 1b It
llama_model_loader: - kv   3:                           general.finetune str              = it
llama_model_loader: - kv   4:                           general.basename str              = gemma-3
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                            general.license str              = gemma
llama_model_loader: - kv   7:                   general.base_model.count u32              = 1
llama_model_loader: - kv   8:                  general.base_model.0.name str              = Gemma 3 1b Pt
llama_model_loader: - kv   9:          general.base_model.0.organization str              = Google
llama_model_loader: - kv  10:              general.base_model.0.repo_url str              = https://huggingface.co/google/gemma-3...
llama_model_loader: - kv  11:                               general.tags arr[str,1]       = ["text-generation"]
llama_model_loader: - kv  12:                      gemma3.context_length u32              = 32768
llama_model_loader: - kv  13:                    gemma3.embedding_length u32              = 1152
llama_model_loader: - kv  14:                         gemma3.block_count u32              = 26
llama_model_loader: - kv  15:                 gemma3.feed_forward_length u32              = 6912
llama_model_loader: - kv  16:                gemma3.attention.head_count u32              = 4
llama_model_loader: - kv  17:    gemma3.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  18:                gemma3.attention.key_length u32              = 256
llama_model_loader: - kv  19:              gemma3.attention.value_length u32              = 256
llama_model_loader: - kv  20:                      gemma3.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:            gemma3.attention.sliding_window u32              = 512
llama_model_loader: - kv  22:             gemma3.attention.head_count_kv u32              = 1
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
time=2025-04-02T16:17:53.026+09:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  26:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  33:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv  35:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  36:               general.quantization_version u32              = 2
llama_model_loader: - kv  37:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  157 tensors
llama_model_loader: - type q5_0:  117 tensors
llama_model_loader: - type q8_0:   14 tensors
llama_model_loader: - type q4_K:   39 tensors
llama_model_loader: - type q6_K:   13 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 6414
llm_load_vocab: token to piece cache size = 1.9446 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma3
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 262144
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 1152
llm_load_print_meta: n_layer          = 26
llm_load_print_meta: n_head           = 4
llm_load_print_meta: n_head_kv        = 1
llm_load_print_meta: n_rot            = 256
llm_load_print_meta: n_swa            = 512
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: f_attn_scale     = 6.2e-02
llm_load_print_meta: n_ff             = 6912
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 999.89 M
llm_load_print_meta: model size       = 762.49 MiB (6.40 BPW)
llm_load_print_meta: general.name     = Gemma 3 1b It
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: EOT token        = 106 '<end_of_turn>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 248 '<0x0A>'
llm_load_print_meta: EOG token        = 1 '<eos>'
llm_load_print_meta: EOG token        = 106 '<end_of_turn>'
llm_load_print_meta: max token length = 48
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llm_load_tensors: offloading 26 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 27/27 layers to GPU
llm_load_tensors:        SYCL0 model buffer size =   762.54 MiB
llm_load_tensors:    SYCL_Host model buffer size =   306.00 MiB
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 2048
llama_new_context_with_model: n_ctx_per_seq = 2048
llama_new_context_with_model: n_batch       = 512
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 1000000.0
llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                     Intel Arc Graphics|  12.71|    128|    1024|   32| 17702M|            1.6.32413|
llama_kv_cache_init:      SYCL0 KV buffer size =    52.00 MiB
llama_new_context_with_model: KV self size  =   52.00 MiB, K (f16):   26.00 MiB, V (f16):   26.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =     1.00 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =   514.25 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    10.26 MiB
llama_new_context_with_model: graph nodes  = 1073
llama_new_context_with_model: graph splits = 2
time=2025-04-02T16:17:54.036+09:00 level=WARN source=runner.go:892 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
time=2025-04-02T16:17:54.277+09:00 level=INFO source=server.go:610 msg="llama runner started in 1.50 seconds"
[GIN] 2025/04/02 - 16:17:54 | 200 |    1.5841203s |       127.0.0.1 | POST     "/api/generate"

but. running gemma3:4b is error.

ollama-ipex-llm-2.2.0b20250318-win>ollama.exe run gemma3
[GIN] 2025/04/02 - 16:04:22 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2025/04/02 - 16:04:22 | 200 |     23.3408ms |       127.0.0.1 | POST     "/api/show"
time=2025-04-02T16:04:22.898+09:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-04-02T16:04:22.898+09:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-04-02T16:04:22.898+09:00 level=INFO source=gpu_windows.go:183 msg="efficiency cores detected" maxEfficiencyClass=1
time=2025-04-02T16:04:22.898+09:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=16 efficiency=10 threads=22
time=2025-04-02T16:04:22.937+09:00 level=INFO source=server.go:104 msg="system memory" total="31.6 GiB" free="19.0 GiB" free_swap="18.9 GiB"
time=2025-04-02T16:04:22.943+09:00 level=INFO source=memory.go:356 msg="offload to device" projector.weights="811.8 MiB" projector.graph="0 B" layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[19.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.5 GiB" memory.required.partial="0 B" memory.required.kv="272.0 MiB" memory.required.allocations="[3.5 GiB]" memory.weights.total="2.1 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="90.7 MiB" memory.graph.partial="90.7 MiB"
time=2025-04-02T16:04:22.952+09:00 level=INFO source=server.go:392 msg="starting llama server" cmd="D:\\dev\\lien\\2025\\2025-rag-poc\\ollama-ipex-llm-2.2.0b20250318-win\\ollama-lib.exe runner --model C:\\Users\\test\\.ollama\\models\\blobs\\sha256-be49949e48422e4547b00af14179a193d3777eea7fbbd7d6e1b0861304628a01 --ctx-size 2048 --batch-size 512 --n-gpu-layers 999 --mmproj C:\\Users\\test\\.ollama\\models\\blobs\\sha256-8c0fb064b019a6972856aaae2c7e4792858af3ca4561be2dbf649123ba6c40cb --threads 6 --no-mmap --parallel 1 --port 52546"
time=2025-04-02T16:04:22.956+09:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2025-04-02T16:04:22.956+09:00 level=INFO source=server.go:571 msg="waiting for llama runner to start responding"
time=2025-04-02T16:04:22.957+09:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server error"
ggml_sycl_init: found 1 SYCL devices:
time=2025-04-02T16:04:23.123+09:00 level=INFO source=runner.go:967 msg="starting go runner"
time=2025-04-02T16:04:23.130+09:00 level=INFO source=runner.go:968 msg=system info="CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | cgo(clang)" threads=6
time=2025-04-02T16:04:23.130+09:00 level=INFO source=runner.go:1026 msg="Server listening on 127.0.0.1:52546"
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llama_load_model_from_file: using device SYCL0 (Intel(R) Arc(TM) Graphics) - 16882 MiB free
llama_model_loader: loaded meta data with 40 key-value pairs and 444 tensors from C:\Users\test\.ollama\models\blobs\sha256-be49949e48422e4547b00af14179a193d3777eea7fbbd7d6e1b0861304628a01 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Gemma 3 4b It
llama_model_loader: - kv   3:                           general.finetune str              = it
llama_model_loader: - kv   4:                           general.basename str              = gemma-3
llama_model_loader: - kv   5:                         general.size_label str              = 4B
llama_model_loader: - kv   6:                            general.license str              = gemma
llama_model_loader: - kv   7:                   general.base_model.count u32              = 1
llama_model_loader: - kv   8:                  general.base_model.0.name str              = Gemma 3 4b Pt
llama_model_loader: - kv   9:          general.base_model.0.organization str              = Google
llama_model_loader: - kv  10:              general.base_model.0.repo_url str              = https://huggingface.co/google/gemma-3...
llama_model_loader: - kv  11:                               general.tags arr[str,1]       = ["image-text-to-text"]
llama_model_loader: - kv  12:                      gemma3.context_length u32              = 131072
llama_model_loader: - kv  13:                    gemma3.embedding_length u32              = 2560
llama_model_loader: - kv  14:                         gemma3.block_count u32              = 34
llama_model_loader: - kv  15:                 gemma3.feed_forward_length u32              = 10240
llama_model_loader: - kv  16:                gemma3.attention.head_count u32              = 8
llama_model_loader: - kv  17:    gemma3.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  18:                gemma3.attention.key_length u32              = 256
llama_model_loader: - kv  19:              gemma3.attention.value_length u32              = 256
llama_model_loader: - kv  20:                      gemma3.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:            gemma3.attention.sliding_window u32              = 1024
llama_model_loader: - kv  22:             gemma3.attention.head_count_kv u32              = 4
llama_model_loader: - kv  23:                   gemma3.rope.scaling.type str              = linear
llama_model_loader: - kv  24:                 gemma3.rope.scaling.factor f32              = 8.000000
llama_model_loader: - kv  25:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  26:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  27:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
time=2025-04-02T16:04:23.208+09:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  28:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  29:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  31:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  32:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  33:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  34:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  35:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  36:                    tokenizer.chat_template str              = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv  37:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  38:               general.quantization_version u32              = 2
llama_model_loader: - kv  39:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  205 tensors
llama_model_loader: - type q4_K:  204 tensors
llama_model_loader: - type q6_K:   35 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 6414
llm_load_vocab: token to piece cache size = 1.9446 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma3
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 262144
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 2560
llm_load_print_meta: n_layer          = 34
llm_load_print_meta: n_head           = 8
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_rot            = 256
llm_load_print_meta: n_swa            = 1024
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 2
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: f_attn_scale     = 6.2e-02
llm_load_print_meta: n_ff             = 10240
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 0.125
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 4B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 3.88 B
llm_load_print_meta: model size       = 2.31 GiB (5.12 BPW)
llm_load_print_meta: general.name     = Gemma 3 4b It
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: EOT token        = 106 '<end_of_turn>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 248 '<0x0A>'
llm_load_print_meta: EOG token        = 1 '<eos>'
llm_load_print_meta: EOG token        = 106 '<end_of_turn>'
llm_load_print_meta: max token length = 48
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llm_load_tensors: offloading 34 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 35/35 layers to GPU
llm_load_tensors:        SYCL0 model buffer size =  2368.18 MiB
llm_load_tensors:          CPU model buffer size =   525.00 MiB
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 2048
llama_new_context_with_model: n_ctx_per_seq = 2048
llama_new_context_with_model: n_batch       = 512
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 1000000.0
llama_new_context_with_model: freq_scale    = 0.125
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                     Intel Arc Graphics|  12.71|    128|    1024|   32| 17702M|            1.6.32413|
llama_kv_cache_init:      SYCL0 KV buffer size =   272.00 MiB
llama_new_context_with_model: KV self size  =  272.00 MiB, K (f16):  136.00 MiB, V (f16):  136.00 MiB
llama_new_context_with_model:  SYCL_Host  output buffer size =     1.01 MiB
llama_new_context_with_model:      SYCL0 compute buffer size =   517.00 MiB
llama_new_context_with_model:  SYCL_Host compute buffer size =    13.01 MiB
llama_new_context_with_model: graph nodes  = 1401
llama_new_context_with_model: graph splits = 2
key general.file_type not found in file
Exception 0xe06d7363 0x19930520 0x96ddeff880 0x7ffc1cb4933a
PC=0x7ffc1cb4933a
signal arrived during external code execution

runtime.cgocall(0x7ff7a28bd8b0, 0xc000493c78)
        runtime/cgocall.go:167 +0x3e fp=0xc000493c50 sp=0xc000493be8 pc=0x7ff7a1d09c1e
ollama/llama/llamafile._Cfunc_clip_model_load(0x22ac8926fe0, 0x1)
        _cgo_gotypes.go:307 +0x56 fp=0xc000493c78 sp=0xc000493c50 pc=0x7ff7a20df8d6
ollama/llama/llamafile.NewClipContext(0xc000613090, {0xc0000381c0, 0x6a})
        ollama/llama/llamafile/llama.go:488 +0x90 fp=0xc000493d38 sp=0xc000493c78 pc=0x7ff7a20e6cd0
ollama/llama/runner.NewImageContext(0xc000613090, {0xc0000381c0, 0x6a})
        ollama/llama/runner/image.go:37 +0xf8 fp=0xc000493db8 sp=0xc000493d38 pc=0x7ff7a20ebe58
ollama/llama/runner.(*Server).loadModel(0xc00050e120, {0x3e7, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, 0xc00048a810, 0x0}, ...)
        ollama/llama/runner/runner.go:881 +0x24f fp=0xc000493f10 sp=0xc000493db8 pc=0x7ff7a20f19cf
ollama/llama/runner.Execute.gowrap1()
        ollama/llama/runner/runner.go:1001 +0xda fp=0xc000493fe0 sp=0xc000493f10 pc=0x7ff7a20f33da
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000493fe8 sp=0xc000493fe0 pc=0x7ff7a1d18901
created by ollama/llama/runner.Execute in goroutine 1
        ollama/llama/runner/runner.go:1001 +0xd0d

goroutine 1 gp=0xc000086000 m=nil [IO wait]:
runtime.gopark(0x7ff7a1d1a0c0?, 0x7ff7a34b3ac0?, 0xa0?, 0x11?, 0xc0000b124c?)
        runtime/proc.go:424 +0xce fp=0xc00011d418 sp=0xc00011d3f8 pc=0x7ff7a1d103ce
runtime.netpollblock(0x408?, 0xa1ca8366?, 0xf7?)
        runtime/netpoll.go:575 +0xf7 fp=0xc00011d450 sp=0xc00011d418 pc=0x7ff7a1cd4f97
internal/poll.runtime_pollWait(0x22aee434d30, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc00011d470 sp=0xc00011d450 pc=0x7ff7a1d0f645
internal/poll.(*pollDesc).wait(0x7ff7a1da2bd5?, 0x7ff7a1d0b0e5?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc00011d498 sp=0xc00011d470 pc=0x7ff7a1da4207
internal/poll.execIO(0xc0000b11a0, 0xc00011d540)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc00011d510 sp=0xc00011d498 pc=0x7ff7a1da5645
internal/poll.(*FD).acceptOne(0xc0000b1188, 0x418, {0xc00016e000?, 0xc00011d5a0?, 0x7ff7a1dad3c5?}, 0xc00011d5d4?)
        internal/poll/fd_windows.go:946 +0x65 fp=0xc00011d570 sp=0xc00011d510 pc=0x7ff7a1da9c85
internal/poll.(*FD).Accept(0xc0000b1188, 0xc00011d720)
        internal/poll/fd_windows.go:980 +0x1b6 fp=0xc00011d628 sp=0xc00011d570 pc=0x7ff7a1da9fb6
net.(*netFD).accept(0xc0000b1188)
        net/fd_windows.go:182 +0x4b fp=0xc00011d740 sp=0xc00011d628 pc=0x7ff7a1e1082b
net.(*TCPListener).accept(0xc000614600)
        net/tcpsock_posix.go:159 +0x1e fp=0xc00011d790 sp=0xc00011d740 pc=0x7ff7a1e2699e
net.(*TCPListener).Accept(0xc000614600)
        net/tcpsock.go:372 +0x30 fp=0xc00011d7c0 sp=0xc00011d790 pc=0x7ff7a1e25750
net/http.(*onceCloseListener).Accept(0xc0000e94d0?)
        <autogenerated>:1 +0x24 fp=0xc00011d7d8 sp=0xc00011d7c0 pc=0x7ff7a20a0044
net/http.(*Server).Serve(0xc0005304b0, {0x7ff7a2cec6f0, 0xc000614600})
        net/http/server.go:3330 +0x30c fp=0xc00011d908 sp=0xc00011d7d8 pc=0x7ff7a2077fcc
ollama/llama/runner.Execute({0xc0000c8010?, 0x0?, 0x0?})
        ollama/llama/runner/runner.go:1027 +0x11a9 fp=0xc00011dca8 sp=0xc00011d908 pc=0x7ff7a20f2fa9
ollama/cmd.NewCLI.func2(0xc0000c6a00?, {0x7ff7a2b2e8ce?, 0x4?, 0x7ff7a2b2e8d2?})
        ollama/cmd/cmd.go:1430 +0x45 fp=0xc00011dcd0 sp=0xc00011dca8 pc=0x7ff7a28bd0c5
github.com/spf13/cobra.(*Command).execute(0xc00050a008, {0xc0000f6120, 0x11, 0x11})
        github.com/spf13/cobra@v1.8.1/command.go:985 +0xaaa fp=0xc00011de58 sp=0xc00011dcd0 pc=0x7ff7a1eaa4ea
github.com/spf13/cobra.(*Command).ExecuteC(0xc0004c4308)
        github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff fp=0xc00011df30 sp=0xc00011de58 pc=0x7ff7a1eaadbf
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.8.1/command.go:1041
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.8.1/command.go:1034
main.main()
        ollama/main.go:12 +0x4d fp=0xc00011df50 sp=0xc00011df30 pc=0x7ff7a28bd72d
runtime.main()
        runtime/proc.go:272 +0x27d fp=0xc00011dfe0 sp=0xc00011df50 pc=0x7ff7a1cddf9d
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff7a1d18901

goroutine 2 gp=0xc000086700 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000089fa8 sp=0xc000089f88 pc=0x7ff7a1d103ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0xc000089fe0 sp=0xc000089fa8 pc=0x7ff7a1cde2b8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000089fe8 sp=0xc000089fe0 pc=0x7ff7a1d18901
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000086a80 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00008bf80 sp=0xc00008bf60 pc=0x7ff7a1d103ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0xc000098000)
        runtime/mgcsweep.go:317 +0xdf fp=0xc00008bfc8 sp=0xc00008bf80 pc=0x7ff7a1cc6f9f
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc00008bfe0 sp=0xc00008bfc8 pc=0x7ff7a1cbb5c5
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008bfe8 sp=0xc00008bfe0 pc=0x7ff7a1d18901
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000086c40 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff7a2cdbb18?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00009ff78 sp=0xc00009ff58 pc=0x7ff7a1d103ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x7ff7a34d79c0)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc00009ffa8 sp=0xc00009ff78 pc=0x7ff7a1cc4969
runtime.bgscavenge(0xc000098000)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc00009ffc8 sp=0xc00009ffa8 pc=0x7ff7a1cc4ef9
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc00009ffe0 sp=0xc00009ffc8 pc=0x7ff7a1cbb565
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00009ffe8 sp=0xc00009ffe0 pc=0x7ff7a1d18901
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000087180 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc0000a1e20 sp=0xc0000a1e00 pc=0x7ff7a1d103ce
runtime.runfinq()
        runtime/mfinal.go:193 +0x107 fp=0xc0000a1fe0 sp=0xc0000a1e20 pc=0x7ff7a1cba687
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a1fe8 sp=0xc0000a1fe0 pc=0x7ff7a1d18901
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc0001e8380 m=nil [chan receive]:
runtime.gopark(0xc00008df60?, 0x7ff7a1dfa2e5?, 0x10?, 0x68?, 0x7ff7a2d028a0?)
        runtime/proc.go:424 +0xce fp=0xc00008df18 sp=0xc00008def8 pc=0x7ff7a1d103ce
runtime.chanrecv(0xc0000383f0, 0x0, 0x1)
        runtime/chan.go:639 +0x41e fp=0xc00008df90 sp=0xc00008df18 pc=0x7ff7a1caac9e
runtime.chanrecv1(0x7ff7a1cde100?, 0xc00008df76?)
        runtime/chan.go:489 +0x12 fp=0xc00008dfb8 sp=0xc00008df90 pc=0x7ff7a1caa852
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1784 +0x2f fp=0xc00008dfe0 sp=0xc00008dfb8 pc=0x7ff7a1cbe6af
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x7ff7a1d18901
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1779 +0x96

goroutine 7 gp=0xc0001e8a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00009bf38 sp=0xc00009bf18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc00009bfc8 sp=0xc00009bf38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc00009bfe0 sp=0xc00009bfc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00009bfe8 sp=0xc00009bfe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 8 gp=0xc0001e8c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00009df38 sp=0xc00009df18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc00009dfc8 sp=0xc00009df38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc00009dfe0 sp=0xc00009dfc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00009dfe8 sp=0xc00009dfe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 18 gp=0xc0001061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 34 gp=0xc0004861c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 9 gp=0xc0001e8e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 10 gp=0xc0001e8fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000475f38 sp=0xc000475f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000475fc8 sp=0xc000475f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000475fe0 sp=0xc000475fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000475fe8 sp=0xc000475fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 11 gp=0xc0001e9180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 12 gp=0xc0001e9340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000471f38 sp=0xc000471f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000471fc8 sp=0xc000471f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000471fe0 sp=0xc000471fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000471fe8 sp=0xc000471fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 13 gp=0xc0001e9500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00047bf38 sp=0xc00047bf18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc00047bfc8 sp=0xc00047bf38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc00047bfe0 sp=0xc00047bfc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00047bfe8 sp=0xc00047bfe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 14 gp=0xc0001e96c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00047df38 sp=0xc00047df18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc00047dfc8 sp=0xc00047df38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc00047dfe0 sp=0xc00047dfc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00047dfe8 sp=0xc00047dfe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 50 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000477f38 sp=0xc000477f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000477fc8 sp=0xc000477f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000477fe0 sp=0xc000477fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000477fe8 sp=0xc000477fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 51 gp=0xc0005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000479f38 sp=0xc000479f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000479fc8 sp=0xc000479f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000479fe0 sp=0xc000479fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000479fe8 sp=0xc000479fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 35 gp=0xc000486380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 15 gp=0xc0001e9880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000585f38 sp=0xc000585f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000585fc8 sp=0xc000585f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000585fe0 sp=0xc000585fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000585fe8 sp=0xc000585fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 19 gp=0xc000106540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 20 gp=0xc000106700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000581f38 sp=0xc000581f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000581fc8 sp=0xc000581f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000581fe0 sp=0xc000581fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000581fe8 sp=0xc000581fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 21 gp=0xc0001068c0 m=nil [GC worker (idle)]:
runtime.gopark(0x27cf82b26c8?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000583f38 sp=0xc000583f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000583fc8 sp=0xc000583f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000583fe0 sp=0xc000583fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000583fe8 sp=0xc000583fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 66 gp=0xc00058c000 m=nil [GC worker (idle)]:
runtime.gopark(0x27cf82b26c8?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000593f38 sp=0xc000593f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000593fc8 sp=0xc000593f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000593fe0 sp=0xc000593fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000593fe8 sp=0xc000593fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 16 gp=0xc0001e9a40 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff7a35264a0?, 0x1?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000587f38 sp=0xc000587f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000587fc8 sp=0xc000587f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000587fe0 sp=0xc000587fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 36 gp=0xc000486540 m=nil [GC worker (idle)]:
runtime.gopark(0x27cf82b26c8?, 0x1?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc00058ff38 sp=0xc00058ff18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc00058ffc8 sp=0xc00058ff38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc00058ffe0 sp=0xc00058ffc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00058ffe8 sp=0xc00058ffe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 37 gp=0xc000486700 m=nil [GC worker (idle)]:
runtime.gopark(0x27cf82b26c8?, 0x1?, 0x54?, 0xd?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000591f38 sp=0xc000591f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000591fc8 sp=0xc000591f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000591fe0 sp=0xc000591fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000591fe8 sp=0xc000591fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 38 gp=0xc0004868c0 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff7a35264a0?, 0x1?, 0x54?, 0xd?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000491f38 sp=0xc000491f18 pc=0x7ff7a1d103ce
runtime.gcBgMarkWorker(0xc0000399d0)
        runtime/mgc.go:1412 +0xe9 fp=0xc000491fc8 sp=0xc000491f38 pc=0x7ff7a1cbd9a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1328 +0x25 fp=0xc000491fe0 sp=0xc000491fc8 pc=0x7ff7a1cbd885
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000491fe8 sp=0xc000491fe0 pc=0x7ff7a1d18901
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1328 +0x105

goroutine 53 gp=0xc000106c40 m=nil [semacquire]:
runtime.gopark(0x0?, 0x0?, 0x60?, 0xa0?, 0x0?)
        runtime/proc.go:424 +0xce fp=0xc000595e18 sp=0xc000595df8 pc=0x7ff7a1d103ce
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.semacquire1(0xc00050e128, 0x0, 0x1, 0x0, 0x12)
        runtime/sema.go:178 +0x232 fp=0xc000595e80 sp=0xc000595e18 pc=0x7ff7a1cf0092
sync.runtime_Semacquire(0x0?)
        runtime/sema.go:71 +0x25 fp=0xc000595eb8 sp=0xc000595e80 pc=0x7ff7a1d118a5
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:118 +0x48 fp=0xc000595ee0 sp=0xc000595eb8 pc=0x7ff7a1d296c8
ollama/llama/runner.(*Server).run(0xc00050e120, {0x7ff7a2cee9b0, 0xc0003ea050})
        ollama/llama/runner/runner.go:315 +0x47 fp=0xc000595fb8 sp=0xc000595ee0 pc=0x7ff7a20edec7
ollama/llama/runner.Execute.gowrap2()
        ollama/llama/runner/runner.go:1006 +0x28 fp=0xc000595fe0 sp=0xc000595fb8 pc=0x7ff7a20f32c8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000595fe8 sp=0xc000595fe0 pc=0x7ff7a1d18901
created by ollama/llama/runner.Execute in goroutine 1
        ollama/llama/runner/runner.go:1006 +0xde5

goroutine 82 gp=0xc0005048c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0xc0006adba0?, 0x48?, 0xdc?, 0xc0006adc4c?)
        runtime/proc.go:424 +0xce fp=0xc000045890 sp=0xc000045870 pc=0x7ff7a1d103ce
runtime.netpollblock(0x414?, 0xa1ca8366?, 0xf7?)
        runtime/netpoll.go:575 +0xf7 fp=0xc0000458c8 sp=0xc000045890 pc=0x7ff7a1cd4f97
internal/poll.runtime_pollWait(0x22aee434c18, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc0000458e8 sp=0xc0000458c8 pc=0x7ff7a1d0f645
internal/poll.(*pollDesc).wait(0x0?, 0xc000045938?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000045910 sp=0xc0000458e8 pc=0x7ff7a1da4207
internal/poll.execIO(0xc0006adba0, 0x7ff7a2bb05e8)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc000045988 sp=0xc000045910 pc=0x7ff7a1da5645
internal/poll.(*FD).Read(0xc0006adb88, {0xc0001cd000, 0x1000, 0x1000})
        internal/poll/fd_windows.go:438 +0x2a7 fp=0xc000045a30 sp=0xc000045988 pc=0x7ff7a1da6347
net.(*netFD).Read(0xc0006adb88, {0xc0001cd000?, 0xc000045aa0?, 0x7ff7a1da46c5?})
        net/fd_posix.go:55 +0x25 fp=0xc000045a78 sp=0xc000045a30 pc=0x7ff7a1e0e945
net.(*conn).Read(0xc00013a3a8, {0xc0001cd000?, 0x0?, 0xc0006165a8?})
        net/net.go:189 +0x45 fp=0xc000045ac0 sp=0xc000045a78 pc=0x7ff7a1e1df25
net.(*TCPConn).Read(0xc0006165a0?, {0xc0001cd000?, 0xc0006adb88?, 0xc000045af8?})
        <autogenerated>:1 +0x25 fp=0xc000045af0 sp=0xc000045ac0 pc=0x7ff7a1e2f945
net/http.(*connReader).Read(0xc0006165a0, {0xc0001cd000, 0x1000, 0x1000})
        net/http/server.go:798 +0x14b fp=0xc000045b40 sp=0xc000045af0 pc=0x7ff7a206dd8b
bufio.(*Reader).fill(0xc0000a43c0)
        bufio/bufio.go:110 +0x103 fp=0xc000045b78 sp=0xc000045b40 pc=0x7ff7a1e34583
bufio.(*Reader).Peek(0xc0000a43c0, 0x4)
        bufio/bufio.go:148 +0x53 fp=0xc000045b98 sp=0xc000045b78 pc=0x7ff7a1e346b3
net/http.(*conn).serve(0xc0000e94d0, {0x7ff7a2cee978, 0xc000616540})
        net/http/server.go:2127 +0x738 fp=0xc000045fb8 sp=0xc000045b98 pc=0x7ff7a20730d8
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3360 +0x28 fp=0xc000045fe0 sp=0xc000045fb8 pc=0x7ff7a20783c8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000045fe8 sp=0xc000045fe0 pc=0x7ff7a1d18901
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3360 +0x485
rax     0x0
rbx     0x96ddeff828
rcx     0x26
rdx     0x96ddefef70
rdi     0xe06d7363
rsi     0x1
rbp     0x4
rsp     0x96ddeff700
r8      0xffff0000
r9      0x96ddeff1fc
r10     0x4
r11     0x7ffc1d000000
r12     0xc000493cf8
r13     0x22af5886c90
r14     0xc000106a80
r15     0xfffffffffffff
rip     0x7ffc1cb4933a
rflags  0x206
cs      0x33
fs      0x53
gs      0x2b
time=2025-04-02T16:04:26.261+09:00 level=INFO source=server.go:605 msg="waiting for server to become available" status="llm server error"
time=2025-04-02T16:04:26.512+09:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/04/02 - 16:04:26 | 500 |    3.6347342s |       127.0.0.1 | POST     "/api/generate"

I think this is where the problem occurred.

key general.file_type not found in file
Exception 0xe06d7363 0x19930520 0x96ddeff880 0x7ffc1cb4933a
PC=0x7ffc1cb4933a
signal arrived during external code execution

also gemma3:12b has same error.

Hi @lien-dkseo,we are upgrading ipex-llm ollama into v0.6.x, and gemma3 will be supported then.