intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

vllm failure in intelanalytics/ipex-llm-serving-xpu:2.2.0-b13

oldmikeyang opened this issue · comments

Describe the bug
Use the Open-webui connect to the vllm server, the vllm server will crash with the following information.
"'OpenAIServingTokenization' object has no attribute 'show_available_models'"

How to reproduce
Steps to reproduce the error:

  1. docker pull the intelanalytics/ipex-llm-serving-xpu:2.2.0-b13
  2. start the vllm serving for Qwen 14B
  3. start Open-webui connect to the vllm server, through OpenAI API.

Screenshots

Image

Environment information
This issue only exisit on the docker image release 2.2.0-b13. docker image release 2.2.0-b11 don't have this issue
intelanalytics/ipex-llm-serving-xpu 2.2.0-b13

Additional context
Add any other context about the problem here.

hi @oldmikeyang, please try 2.2.0-b15. This is fixed in b15.

Yes. b16 release don't have this issue.