horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.

Home Page:https://arxiv.org/abs/2305.11627

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error occurs when pruning LLaMa2-7b

moonlightian opened this issue · comments

With cmd like:
CUDA_VISIBLE_DEVICES=0 python hf_prune.py --base_model path_to_cached_hf_llama2-7b --pruning_ratio 0.25 --device cpu --eval_device cuda --block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 --block_attention_layer_start 4 --block_attention_layer_end 30 --pruner_type taylor --test_after_train --taylor param_first --save_model
It throws an error :"addmm_impl_cpu_" not implemented for 'Half'
image

torch==2.0.0
transformers==4.31.0

remove --device cpu ,solved
while it still failed with an shape error
image

Hi. Do you modify the code for loading the Llama2?

Hi. Do you modify the code for loading the Llama2?

Yes, I modified the code loading the model from
image

to
image

Because I found it unsuccessful to load models with origin codes

Ahh..The model code for llama2 needs to be modified to satisfy the updated attribute. Some of the dimension calculation is fixed in the official code, which is unsuitable for the inference of the pruned model.

Two ways to solve this bug:

  1. Modify the fixed attribute in the modeling_llama.py. The problematic attribute is the self.num_key_value_heads, and you can manually set it (below is an example):
for layer in model.model.layers:
      layer.self_attn.num_heads = layer.self_attn.q_proj.weight.data.shape[0] // layer.self_attn.head_dim
  1. Use the code in this repo to load the model. I'm not sure why it is unsuccessful to load the model. If possible, could you plot the error msg here?

Thank you for your kind advice! It worked finally.

if you use --device cpu and --save_model, in below fragment of code, the model will be turned into half precise. So you got the first error.
image

You just need to add 'model.float()' behind 'torch.save' like this, and you will solve the first error.
image