intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TypeError: llama_model_forward_4_36() got an unexpected keyword argument 'cache_position' during inference by TinyLlama-1.1B-Chat-v1.0

lei-sun-intel opened this issue · comments

HW: MTL platform
OS: Ubuntu 22.04
intel-extension-for-pytorch 2.1.20+git0e2bee2
ipex-llm 2.1.0b20240423
transformers 4.41.1

Model:
https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0

Error:
(dev-zone) testv03@intel-Meteor-Lake-Client-Platform:~/.cache_dev_zone/notebooks/aigc_apps$ python generate_tinyllama-1.1b.py --repo-id-or-model-path ./TinyLlama-1.1B-Chat-v1.0 --prompt "What is AI?" --n-predict 32
2024-05-31 21:16:53,797 - INFO - intel_extension_for_pytorch auto imported
2024-05-31 21:16:53,996 - INFO - Converting the current model to sym_int4 format......
Traceback (most recent call last):
File "/home/testv03/.cache_dev_zone/notebooks/aigc_apps/generate_tinyllama-1.1b.py", line 37, in
output = model.generate(input_ids,
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/ipex_llm/transformers/lookup.py", line 86, in generate
return original_generate(self,
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/ipex_llm/transformers/speculative.py", line 108, in generate
return original_generate(self,
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/transformers/generation/utils.py", line 2397, in _sample
outputs = self(
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
outputs = self.model(
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/testv03/.miniconda_dev_zone/envs/dev-zone/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
TypeError: llama_model_forward_4_36() got an unexpected keyword argument 'cache_position'

Hi @lei-sun-intel, our support for the Llama model is only up to transformers==4.38.2, transformers==4.41.0 is not yet supported. You may refer to our document install the latest version of ipex-llm.

Hi @lei-sun-intel , could you please downgrade transformers to version 4.36.2 and try running your example again?

I have tried transformers==4.36.2 and 4.37.0, it works

Thanks a lot @sgwhat for your help