intel / intel-npu-acceleration-library

Intel® NPU Acceleration Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

profile_llm.py fails with default arguments

rradjabi opened this issue · comments

Describe the bug
Running profile_llm.py fails when running with the default arguments.

To Reproduce
Steps to reproduce the behavior:

  1. cd script
  2. python profile_llm.py
  3. See error

Expected behavior
This should produce the profile report for LLM.

Screenshots

python .\profile_llm.py
Profiling TinyLlama/TinyLlama-1.1B-Chat-v1.0 with context size 128
Traceback (most recent call last):
  File "C:\workspace\npu-accel-try2\intel-npu-acceleration-library\script\profile_llm.py", line 153, in <module>
    main(
  File "C:\workspace\npu-accel-try2\intel-npu-acceleration-library\script\profile_llm.py", line 52, in main
    intel_npu_acceleration_library.nn.llm.warm_up_decoder_model(
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\intel_npu_acceleration_library\nn\llm.py", line 415, in warm_up_decoder_model
    for _ in results:
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\utils\_contextlib.py", line 56, in generator_context
    response = gen.send(request)
               ^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\intel_npu_acceleration_library\nn\llm.py", line 353, in generate_with_static_shape
    out = model(
          ^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1208, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1018, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 741, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\.conda\envs\mtl-npu-accel-try2\Lib\site-packages\intel_npu_acceleration_library\nn\llm.py", line 226, in forward
    causal_mask = causal_mask[:, :, cache_position, : key_states.shape[-2]]
                  ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 127 is out of bounds for dimension 0 with size 1

Desktop (please complete the following information):

  • OS: Windows 11

Additional context
With default arguments, kv-caching is enabled. If I explicitly disable kv-caching by setting use_past to False in warm_up_decoder_model and generate_with_static_shape, then it passes.

What version of transformers library are you using? they recently changed the interface so might be related to that

Hi @alessandropalla

Transformers version: 4.39.3

Can you try with the last one, v4.40.1?

Upon upgrading with pip install transformers==4.40.1 I get the following dependency error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
optimum 1.19.0 requires transformers[sentencepiece]<4.40.0,>=4.26.0, but you have transformers 4.40.1 which is incompatible.
optimum-intel 1.16.0 requires transformers<4.40.0,>=4.36.0, but you have transformers 4.40.1 which is incompatible.

Can this be ignored?

I think 4.40.0 is fine as well

Please update the library to the latest version to fix this issue