qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'

leszekhanusz opened this issue · comments

I would like to use the TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ model on my RTX3080 with 10GB of VRAM, using oobabooga/text-generation-webui.

It works great at first but once the input gets too big, it does not work anymore because of not enough VRAM.

So I'm trying to use the model with pre_layer set to 20 to move a part of the model to the CPU, but when I do that I receive the following error:

Traceback (most recent call last):
  File "/mnt/tera/git-repos/text-generation-webui/modules/callbacks.py", line 73, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/mnt/tera/git-repos/text-generation-webui/modules/text_generation.py", line 267, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1565, in generate
    return self.sample(
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2612, in sample
    outputs = self(
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
    outputs = self.model(
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/tera/git-repos/text-generation-webui/repositories/GPTQ-for-LLaMa/llama_inference_offload.py", line 154, in forward
    layer_outputs = decoder_layer(
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 293, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/mnt/tera/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/tera/git-repos/text-generation-webui/repositories/GPTQ-for-LLaMa/quant/fused_attn.py", line 154, in forward
    attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, is_causal=is_causal)
AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'

My version of pytorch is 1.13.1 in that Python environment, and I believe that the scaled_dot_product_attention attribute is only available with a newer version (2.0) of pytorch.

Note that I had to change the GPTQ-for-LLaMa repository in repositories/GPTQ-for-LLaMa to the a recent commit of the qwopqwop200 repo to make this model work.

What should I do? Is it safe to try to upgrade to pytorch 2.0 or will it causes other issues.

Note: I've also reported this issue on text-generation-webui.

a simple pip install --upgrade torch to upgrade torch to 2.x solved the problem.