intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ipex-llm(0517) Failed to Run 'baichuan-inc/Baichuan2-7B-Chat' in batch_size==2 and batch_size==4 with 32-32, 1024-128, 2048-256 input_length

MargarettMao opened this issue · comments

Transformers: 4.37.0
ipex-llm: 0517
sym_int4
API: "transformer_int4_gpu"
arc11

ipex-llm(0516) could successfully run 'baichuan-inc/Baichuan2-7B-Chat' in batch_size==2 with input_length=='32-32','1024-128','2048-256' and batch_size==4 with input_length=='32-32','1024-128' (exclude '2048-256')
ipex-llm(0517) failed to run 'baichuan-inc/Baichuan2-7B-Chat' in batch_size==2 and batch_size==4 with any of these three input_lengths.
Both versions work well in batch_size==1 with all of these three input_lengths.
image