Calculating Importance of 'param_mix'

Question

Calculating Importance of 'param_mix'

kiucho opened this issue 8 months ago · comments

KIUCHO commented 8 months ago

Hello. First of all, thank you for sharing great research.

I have a question about calculating the importance of parameters.

In class TaylorImportance in hf_llama_pruner.py, line 274,

Could you please tell me why the importance about mixed-order is calculated as follow:

salience = salience - 0.5 * layer.weight * layer.weight.acc_grad * layer.weight

(not as the sum of 1st and 2nd orders)

Is higher-order term neglected?

Horseee · Answer 1 · Wed Sep 27 2023 20:14:03 GMT+0800 (China Standard Time)

Hi kiucho,

For Question 1:

The derivation of this is (e.g., for eq.5 in the paper):

where

And thus, for here, it would be subtract the second hessian term from the first-order term.

There is a mistake in the first version of our paper and please refer to our code. We uploaded a new version of paper on arxiv (I'm not sure when it would be released, but I guess it would be available in the next 24 hours).

For Question2:

Yes, we can neglect higher-order terms because their impact is negligible due to their small scale compared to the preceding term. This is primarily because the first-order term always dominates, given that the model consistently remains not fully convergence when applied to our calibration samples (evidenced by the presence of a large loss during the pruning process)

KIUCHO · Answer 2 · Wed Oct 04 2023 10:31:15 GMT+0800 (China Standard Time)

Thank you for your kind explanation. I checked the new version of your paper. Thanks once again.