profile_llm.py fails with default arguments
rradjabi opened this issue · comments
Describe the bug
Running profile_llm.py
fails when running with the default arguments.
To Reproduce
Steps to reproduce the behavior:
cd script
python profile_llm.py
- See error
Expected behavior
This should produce the profile report for LLM.
Screenshots
python .\profile_llm.py
Profiling TinyLlama/TinyLlama-1.1B-Chat-v1.0 with context size 128
Traceback (most recent call last):
File "C:\workspace\npu-accel-try2\intel-npu-acceleration-library\script\profile_llm.py", line 153, in <module>
main(
File "C:\workspace\npu-accel-try2\intel-npu-acceleration-library\script\profile_llm.py", line 52, in main
intel_npu_acceleration_library.nn.llm.warm_up_decoder_model(
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\intel_npu_acceleration_library\nn\llm.py", line 415, in warm_up_decoder_model
for _ in results:
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\utils\_contextlib.py", line 56, in generator_context
response = gen.send(request)
^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\intel_npu_acceleration_library\nn\llm.py", line 353, in generate_with_static_shape
out = model(
^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1208, in forward
outputs = self.model(
^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1018, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 741, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\intel_npu_acceleration_library\nn\llm.py", line 226, in forward
causal_mask = causal_mask[:, :, cache_position, : key_states.shape[-2]]
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 127 is out of bounds for dimension 0 with size 1
Desktop (please complete the following information):
- OS: Windows 11
Additional context
With default arguments, kv-caching
is enabled. If I explicitly disable kv-caching
by setting use_past
to False
in warm_up_decoder_model
and generate_with_static_shape
, then it passes.
What version of transformers
library are you using? they recently changed the interface so might be related to that
Transformers version: 4.39.3
Can you try with the last one, v4.40.1
?
Upon upgrading with pip install transformers==4.40.1
I get the following dependency error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
optimum 1.19.0 requires transformers[sentencepiece]<4.40.0,>=4.26.0, but you have transformers 4.40.1 which is incompatible.
optimum-intel 1.16.0 requires transformers<4.40.0,>=4.36.0, but you have transformers 4.40.1 which is incompatible.
Can this be ignored?
I think 4.40.0
is fine as well
Please update the library to the latest version to fix this issue