magic-research / PLLaVA

Official repository for the paper PLLaVA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error with Flash attention2 while testing the 34b demo

zhangchunjie1999 opened this issue · comments

When I input instructions to test the 34b model demo:
bash scripts/demo.sh MODELS/pllava-34b MODELS/pllava-34b
An error like the one below will occur:
raceback (most recent call last): File "/data/miniconda3/envs/pllava/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/data/miniconda3/envs/pllava/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/group/40010/esmezhang/PLLaVA/tasks/eval/demo/pllava_demo.py", line 246, in <module> chat = init_model(args) File "/group/40010/esmezhang/PLLaVA/tasks/eval/demo/pllava_demo.py", line 29, in init_model model, processor = load_pllava( File "/group/40010/esmezhang/PLLaVA/tasks/eval/model_utils.py", line 53, in load_pllava model = PllavaForConditionalGeneration.from_pretrained(repo_id, config=config, torch_dtype=torch.bfloat16) File "/data/miniconda3/envs/pllava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3552, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/group/40010/esmezhang/PLLaVA/models/pllava/modeling_pllava.py", line 295, in __init__ self.language_model = AutoModelForCausalLM.from_config(config.text_config, torch_dtype=config.torch_dtype, attn_implementation="flash_attention_2") File "/data/miniconda3/envs/pllava/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 437, in from_config return model_class._from_config(config, **kwargs) File "/data/miniconda3/envs/pllava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1385, in _from_config config = cls._autoset_attn_implementation( File "/data/miniconda3/envs/pllava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1454, in _autoset_attn_implementation cls._check_and_enable_flash_attn_2( File "/data/miniconda3/envs/pllava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1557, in _check_and_enable_flash_attn_2 raise ImportError(f"{preface} Flash Attention 2 is not available. {install_message}") ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

run transformers-cli env in the terminal and the output are:

`

  • transformers version: 4.40.2
  • Platform: Linux-4.14.105-1-tlinux3-0013-x86_64-with-glibc2.17
  • Python version: 3.10.14
  • Huggingface_hub version: 0.20.3
  • Safetensors version: 0.4.2
  • Accelerate version: 0.30.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.1+cu118 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?: `

How can I fix it?

The hardware device is 2 pieces of 40g A100

The hardware device is 2 pieces of 40g A100

Hi, how did you fix this?

收到邮件了~   祝安!

Hi, have you solved the problem? I encounterd the same error when run demo.py. If you have solved the problem, could you please share your solution, thank you.