intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

Failed to run Qwen2-vl with ipex-llm + a760

Zhiwei-Lii opened this issue · comments

Describe the bug
Following the wiki:
https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Multimodal/qwen2-vl

And command as below:
python3 ./generate.py --repo-id-or-model-path /Qwen2-VL-2B/ --prompt "how are you?" --n-predict 128 --modelscope --image-url-or-path /home/llm/sample.jpg

But meet below error:
(myenv) root@intel-rpl-p-poc:/home/llm# python3 ./generate.py --repo-id-or-model-path /Qwen2-VL-2B/ --prompt "how are you?" --n-predict 128 --modelscope --image-url-or-path /home/llm/sample.jpg
/home/llm/myenv/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
2025-03-25 02:59:10,581 - INFO - intel_extension_for_pytorch auto imported
2025-03-25 02:59:10,636 - INFO - set VIDEO_TOTAL_PIXELS: 90316800
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.15it/s]
2025-03-25 02:59:10,988 - INFO - Converting the current model to sym_int4 format......
/home/llm/myenv/lib/python3.11/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
Traceback (most recent call last):
File "/home/llm/./generate.py", line 102, in
generated_ids = model.generate(
^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/ipex_llm/transformers/lookup.py", line 125, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/ipex_llm/transformers/speculative.py", line 127, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/ipex_llm/transformers/pipeline_parallel.py", line 283, in generate
return original_generate(self,
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2048, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/transformers/generation/utils.py", line 3001, in _sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/myenv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1774, in prepare_inputs_for_generation
if cache_position is None or (cache_position is not None and cache_position[0] == 0):
~~~~~~~~~~~~~~^^^
IndexError: index 0 is out of bounds for dimension 0 with size 0

How to reproduce
Almost following the wiki, but i'm in a docker env.
https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/Multimodal/qwen2-vl

Hello, we are trying to reproduce your issue. Could you please share us your docker env's information? It will help us to reproduce your problem.