Error occurs when pruning LLaMa2-7b

Question

Error occurs when pruning LLaMa2-7b

moonlightian opened this issue 9 months ago · comments

With cmd like:
CUDA_VISIBLE_DEVICES=0 python hf_prune.py --base_model path_to_cached_hf_llama2-7b --pruning_ratio 0.25 --device cpu --eval_device cuda --block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 --block_attention_layer_start 4 --block_attention_layer_end 30 --pruner_type taylor --test_after_train --taylor param_first --save_model
It throws an error :"addmm_impl_cpu_" not implemented for 'Half'

torch==2.0.0
transformers==4.31.0

Yibo Jin · Answer 1 · Mon Aug 28 2023 17:09:20 GMT+0800 (China Standard Time)

remove --device cpu ,solved
while it still failed with an shape error

Horseee · Answer 2 · Mon Aug 28 2023 18:01:42 GMT+0800 (China Standard Time)

Hi. Do you modify the code for loading the Llama2?

Yibo Jin · Answer 3 · Mon Aug 28 2023 19:05:40 GMT+0800 (China Standard Time)

Hi. Do you modify the code for loading the Llama2?

Yes， I modified the code loading the model from

to

Because I found it unsuccessful to load models with origin codes

Horseee · Answer 4 · Mon Aug 28 2023 19:14:58 GMT+0800 (China Standard Time)

Ahh..The model code for llama2 needs to be modified to satisfy the updated attribute. Some of the dimension calculation is fixed in the official code, which is unsuitable for the inference of the pruned model.

Two ways to solve this bug:

Modify the fixed attribute in the modeling_llama.py. The problematic attribute is the self.num_key_value_heads, and you can manually set it (below is an example):

for layer in model.model.layers:
      layer.self_attn.num_heads = layer.self_attn.q_proj.weight.data.shape[0] // layer.self_attn.head_dim

Use the code in this repo to load the model. I'm not sure why it is unsuccessful to load the model. If possible, could you plot the error msg here?

Yibo Jin · Answer 5 · Mon Aug 28 2023 19:20:44 GMT+0800 (China Standard Time)

Thank you for your kind advice! It worked finally.

owlsan49 · Answer 6 · Sun Feb 25 2024 17:24:19 GMT+0800 (China Standard Time)

if you use --device cpu and --save_model, in below fragment of code, the model will be turned into half precise. So you got the first error.

You just need to add 'model.float()' behind 'torch.save' like this, and you will solve the first error.