intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

ollama-0.5.4-ipex-llm-2.2.0b20250226-win model xxx not found

Yaquan2 opened this issue · comments

Error “model xxx not found” occurs when running DS model in ollama-0.5.4-ipex-llm-2.2.0b20250226-win.

Step:

  1. Download "ollama-0.5.4-ipex-llm-2.2.0b20250226-win" from https://www.modelscope.cn/models/ipexllm/ollama-ipex-llm/files
  2. Extract the zip file to a folder
  3. Start Ollama serve as follows:
    • Open "Command Prompt" (cmd), enter the extracted folder by "cd /d PATH\TO\EXTRACTED\FOLDER"
    • Run "start-ollama.bat" in the "Command Prompt", and then a window will pop up for Ollama serve
  4. In the same "Command Prompt" (not the pop-up window), run "curl http://localhost:11434/api/generate -d "{"model": "DeepSeek-R1-Distill-Llama-8B-GGUF", "prompt": "Hello" }"" (you may use any other model)

error message:
C:\Users\X>curl http://localhost:11434/api/generate -d "{"model": "deepseek-r1:8b", "prompt": "Hello" }"
{"error":"model 'deepseek-r1:8b' not found"}
C:\Users\X>cd C:\Users\X\Documents\ollama-0.5.4-ipex-llm-2.2.0b20250226-win

C:\Users\X\Documents\ollama-0.5.4-ipex-llm-2.2.0b20250226-win>ollama list
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
NAME ID SIZE MODIFIED
modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M 4c337bc8b461 4.9 GB 4 days ago
modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M 5782868bf1bb 9.0 GB 5 days ago

C:\Users\X\Documents\ollama-0.5.4-ipex-llm-2.2.0b20250226-win>curl http://localhost:11434/api/generate -d "{"model": "DeepSeek-R1-Distill-Llama-8B-GGUF", "prompt": "Hello" }"
{"error":"model 'DeepSeek-R1-Distill-Llama-8B-GGUF' not found"}
C:\Users\X\Documents\ollama-0.5.4-ipex-llm-2.2.0b20250226-win>curl http://localhost:11434/api/generate -d "{"model": "DeepSeek-R1-Distill-Qwen-14B-GGUF", "prompt": "Hello" }"
{"error":"model 'DeepSeek-R1-Distill-Qwen-14B-GGUF' not found"}

C:\Users\X\Documents\ollama-0.5.4-ipex-llm-2.2.0b20250226-win>curl http://localhost:11434/api/generate -d "{ \"model\": \"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M\",  \"prompt\": \"Hello\" }"
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:19.7775221Z","response":"\u003cthink\u003e","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:19.8424844Z","response":"\n\n","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:19.9085882Z","response":"\u003c/think\u003e","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:19.9743624Z","response":"\n\n","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.0421396Z","response":"Hello","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.1099083Z","response":"!","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.1747429Z","response":" How","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.2400073Z","response":" can","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.3047167Z","response":" I","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.3693117Z","response":" assist","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.435475Z","response":" you","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.4994315Z","response":" today","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.5655427Z","response":"?","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.6965622Z","response":" 😊","done":false}
{"model":"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M","created_at":"2025-03-10T08:06:20.7619233Z","response":"","done":true,"done_reason":"stop","context":[128011,9906,128012,128013,271,128014,271,9906,0,2650,649,358,7945,499,3432,30,27623,232],"total_duration":10450208800,"load_duration":7843898900,"prompt_eval_count":4,"prompt_eval_duration":1613000000,"eval_count":16,"eval_duration":990000000}

curl http://localhost:11434/api/generate -d "{ "model": "modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M", "prompt": "Hello", "stream": false}"

怀疑是modelscope和ipex-llm下载模型源地址不同导致的。请下载以下链接的 20250226,

https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly

运行 ollama run deepseek-r1:8b

然后 curl http://localhost:11434/api/generate -d "{"model": "deepseek-r1:8b", "prompt": "Hello" }"

Hi @Yaquan2,

DeepSeek-R1-Distill-Llama-8B-GGUF is not a correct model id on either Ollama library or ModelScope.

Ollama portable zip downloaded from ModelScope will use ModelScope as default model source. Model downloaded with ModelScope as model source will still show actual model id in ollama list, e.g.

NAME                                                             ID              SIZE      MODIFIED
modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M   xxxxxxxxxxxx    xxx GB    About a minute ago

Except for ollama run and ollama pull, the model should be identified through its actual id, e.g. ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M or curl http://localhost:11434/api/generate -d "{\"model\": \"modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M\", \"prompt\": \"Hello\", \"stream\": false}"

Please refer to here for more information :)

We found that the problem was due to different model names being downloaded from github and modelscope, it's working as expected now, hence closing this issue, thank you for your help!

`
C:\Users\X\Documents\ollama-0.5.4-ipex-llm-2.2.0b20250226-win (1)>ollama list
ggml_sycl_init: GGML_SYCL_FORCE_MMQ: no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
NAME ID SIZE MODIFIED
deepseek-r1:7b 0a8c26691023 4.7 GB 29 minutes ago
modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M 4c337bc8b461 4.9 GB 5 days ago
modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M 5782868bf1bb 9.0 GB 6 days ago

C:\Users\X\Documents\ollama-0.5.4-ipex-llm-2.2.0b20250226-win (1)>curl http://localhost:11434/api/generate -d "{ "model": "deepseek-r1:7b", "prompt": "Hello", "stream":false }"
{"model":"deepseek-r1:7b","created_at":"2025-03-11T02:47:29.4971839Z","response":"\u003cthink\u003e\n\n\u003c/think\u003e\n\nHello! How can I assist you today? 😊","done":true,"done_reason":"stop","context":[151644,9707,151645,151648,271,151649,271,9707,0,2585,646,358,7789,498,3351,30,26525,232],"total_duration":977898300,"load_duration":20709700,"prompt_eval_count":4,"prompt_eval_duration":55000000,"eval_count":16,"eval_duration":901000000}
`

ollama show --modelfile deepseek-r1:8b
and
ollama show --modelfile modelscope.cn/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M

can show the different between deepseek-r1:8b and DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M