intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Qwen-7B TypeError: qwen_attention_forward() got an unexpected keyword argument 'registered_causal_mask'

juan-OY opened this issue · comments

Model is based on Qwen 1.0, it once worked, but with latest ipex-llm
ipex-llm 2.1.0b20240521
Follow below guide to install.
https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen#1-install

It reports issue with an unexpected keyword argument 'registered_causal_mask', the same code worked Qwen-7B-Chat
python generate_ipexllm.py
C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
2024-05-22 21:34:06,278 - INFO - intel_extension_for_pytorch auto imported
2024-05-22 21:34:06,330 - WARNING - Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码,尤其如果你在9月25日前已经开始使用Qwen-7B,千万注意不要使用错误代码和模型。
2024-05-22 21:34:06,330 - WARNING - Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
2024-05-22 21:34:06,330 - WARNING - Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
2024-05-22 21:34:06,331 - WARNING - Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
2024-05-22 21:34:06,720 - INFO - Converting the current model to sym_int4 format......
Traceback (most recent call last):
File "C:\multi-modality\cvte_qwen\ultra_test_code_and_data\benchmark_test2intel\generate_ipexllm.py", line 71, in
output = model.generate(input_ids,
File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 1330, in generate
return super().generate(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\transformers\generation\utils.py", line 1588, in generate
return self.sample(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\transformers\generation\utils.py", line 2642, in sample
outputs = self(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 1120, in forward
transformer_outputs = self.transformer(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\ipex_llm\transformers\models\qwen.py", line 369, in qwen_model_forward
outputs = block(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Intel/.cache\huggingface\modules\transformers_modules\us_qwen_0435_r2-int4\modeling_qwen.py", line 653, in forward
attn_outputs = self.attn(
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Intel\miniconda3\envs\qwen\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
TypeError: qwen_attention_forward() got an unexpected keyword argument 'registered_causal_mask'

Sorry that I can not reproduce this issue on qwen-7b-chat

(changmin-llm) arda@arda-arc13:~/changmin/llm.cpp$ pip install ipex-llm==2.1.0b20240521
Collecting ipex-llm==2.1.0b20240521
  Using cached ipex_llm-2.1.0b20240521-py3-none-manylinux2010_x86_64.whl.metadata (5.0 kB)
Using cached ipex_llm-2.1.0b20240521-py3-none-manylinux2010_x86_64.whl (13.8 MB)
Installing collected packages: ipex-llm
  Attempting uninstall: ipex-llm
    Found existing installation: ipex-llm 2.1.0b20240522
    Uninstalling ipex-llm-2.1.0b20240522:
      Successfully uninstalled ipex-llm-2.1.0b20240522
Successfully installed ipex-llm-2.1.0b20240521
(changmin-llm) arda@arda-arc13:~/changmin/llm.cpp$ python qwen.py 
/home/arda/miniforge3/envs/changmin-llm/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-05-23 09:36:35,438 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 22.53it/s]
2024-05-23 09:36:35,965 - INFO - Converting the current model to sym_int4 format......
-------------------- Prompt --------------------

<|im_start|>system
You are a helpful assistant.
<|im_end|>
<|im_start|>user
AI是什么?
<|im_end|>
<|im_start|>assistant

-------------------- Output --------------------

system
You are a helpful assistant.

user
AI是什么?

assistant
AI是人工智能的缩写,它是指模拟人类智能的技术和方法。它是研究如何让计算机像人一样思考、学习、理解和处理信息的

fixed in #11110