intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support Qwen-1.8B-Chat

Vincent131499 opened this issue · comments

Thanks for the excellent work!
I am using the Qwen-1.8B-Chat and found the following bug. After investigation, 1.8B is not adapted. Can you support it?

python scripts/inference.py --model_name qwen -m qwen-1_8b-ne-q4_j.bin -c 512 -b 1024 -n 256 -t 20 --color -p "She opened the door and see"

Namespace(model_name='qwen', model=PosixPath('qwen-1_8b-ne-q4_j.bin'), build_dir=PosixPath('/code/llm-workspace/cpu-workspace/intel-extension-for-transformers/intel_extension_for_transformers/llm/runtime/graph/scripts/../build'), prompt='She opened the door and see', tokenizer='THUDM/chatglm-6b', n_predict=256, threads=20, batch_size_truncate=1024, ctx_size=512, seed=-1, repeat_penalty=1.1, color=True, keep=0, shift_roped_k=False, memory_f32=False, memory_f16=False, memory_auto=False)
cmd: [PosixPath('/code/llm-workspace/cpu-workspace/intel-extension-for-transformers/intel_extension_for_transformers/llm/runtime/graph/scripts/../build/bin/run_qwen'), '--model', PosixPath('qwen-1_8b-ne-q4_j.bin'), '--prompt', 'She opened the door and see', '--n-predict', '256', '--threads', '20', '--batch-size-truncate', '1024', '--ctx-size', '512', '--seed', '-1', '--repeat-penalty', '1.1', '--keep', '0', '--color', '--ids', '195, 660, 2255, 100, 2144, 102, 275, 130001, 130004, 196']
Welcome to use the qwen on the ITREX! 
main: seed  = 1705373295
AVX:1 AVX2:1 AVX512F:1 AVX_VNNI:0 AVX512_VNNI:1 AMX_INT8:0 AMX_BF16:0 AVX512_BF16:0 AVX512_FP16:0
model.cpp: loading model from qwen-1_8b-ne-q4_j.bin
init: n_vocab    = 151936
init: n_embd     = 2048
init: n_mult     = 11008
init: n_head     = 16
init: n_layer    = 24
init: n_rot      = 128
init: n_ff       = 5504
init: n_parts    = 1
MODEL_ASSERT: /code/llm-workspace/cpu-workspace/intel-extension-for-transformers/intel_extension_for_transformers/llm/runtime/graph/models/qwen/qwen.h:34: false

Looking forward to reply.

Thanks for your supports.
we need inserts it to our schedule, as long as we support it, we will let you know.

@Vincent131499 You can try again with this pr

@Vincent131499 You can try again with this pr

ok! I'll try this.