intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

llama_server.exe build by llama.cpp d7cfe1f crashed when using ipex-llm to improve performance.

cjsdurj opened this issue · comments

enviroment

os: win11 , cpu: ultra7-155H ,Intel(R) oneAPI DPC++/C++ Compiler 2025.0.4 (2025.0.4.20241205)

Reproduce bug

  1. clone llama.cpp & checkout d7cfe1f ,build llama-server.exe using DPC++ Compiler
  2. copy *.dll to llama.cpp/build/bin overwirte origin dll
  3. start llama_server and crashed when load model

Hi, we provide llama_server.exe in our nightly package, you could directly use it following our guide (https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md).

in my use case: I had added openai style video & image chat api using vl and some other code in llama_server . so build from source code and replace ggml.dll & llama.dll with dlls in ipex-llm package is useful.