intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IPEX-LLM on Ultra 155H

kylinzhao90 opened this issue · comments

Ultra 155H platform(https://ark.intel.com/content/www/us/en/ark/products/236847/intel-core-ultra-7-processor-155h-24m-cache-up-to-4-80-ghz.html).
We installed Text Generation WebUI on Intel GPU according to https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html successfully.
But after load model on iGPU and chat with AI, The entire output is meaningless text as below. Can you help have a look?
image

As a comparison, load with CPU, it works well as below.
image

Hi @kylinzhao90 , which model are you running?

Hi @kylinzhao90 , which model are you running?

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Hi @kylinzhao90 , we are reproducing your issue and will inform you when we have made progress.

Hi @kylinzhao90 , I cannot reproduce your issue, may I know your ipex-llm and transformers version?

image

my test environment was deleted, please close this issue, I will ask for help if I encounter it again.