ipex-llm(0517) Failed to Run 'baichuan-inc/Baichuan2-7B-Chat' in batch_size==2 and batch_size==4 with 32-32, 1024-128, 2048-256 input_length

Question

ipex-llm(0517) Failed to Run 'baichuan-inc/Baichuan2-7B-Chat' in batch_size==2 and batch_size==4 with 32-32, 1024-128, 2048-256 input_length

MargarettMao opened this issue 2 months ago · comments

Wenjing Margaret Mao commented 2 months ago

Transformers: 4.37.0
ipex-llm: 0517
sym_int4
API: "transformer_int4_gpu"
arc11

ipex-llm(0516) could successfully run 'baichuan-inc/Baichuan2-7B-Chat' in batch_size==2 with input_length=='32-32','1024-128','2048-256' and batch_size==4 with input_length=='32-32','1024-128' (exclude '2048-256')
ipex-llm(0517) failed to run 'baichuan-inc/Baichuan2-7B-Chat' in batch_size==2 and batch_size==4 with any of these three input_lengths.
Both versions work well in batch_size==1 with all of these three input_lengths.

Cengguang Zhang · Answer 1 · Thu May 23 2024 10:50:03 GMT+0800 (China Standard Time)

Fixed in PR https://github.com/intel-analytics/llm.cpp/pull/418