horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.

Home Page:https://arxiv.org/abs/2305.11627

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Eval Loss NaN on Llama-2

mmichaelzhang opened this issue · comments

Hi,

By anychance, have you tried literally run on the llama-2 model?

I tried using default llama parameters for pruning and post-training, resulting in similar wikitext2 score (~19) but much worse score for ptb (~70).

Also, when running post-training with the parameter set of llama, llama-2 loss explodes after ~0.2 epoch. Tried using smaller lr (1e-5) yet eval loss exploded to nan.

It would be of great help if you could provide some insights on both pruning and post-training parameters.

Thanks.

Greetings!

Regarding the test results on PTB after pruning, the reason for the worse score lies in the unpruned llama2-7B model, which was approximately 47, significantly higher than llama-7B (~22) on PTB.

As for the issue of NaN during post-training, we encountered the same problem you reported. Currently, we are searching for the appropriate hyper-parameters to fine-tune the pruned model. If we obtain any new findings or find any bugs in our code, we will promptly update you.

Thank you for the timely reply! Hope to get back with good news.

Cheers.

@mmichaelzhang Have you resolved this issue? I also observed training loss explosion and encountered performance deterioration for llama2-7b using default llama settings:

Wikitext2 w/o tune Ptb w/o tune BoolQ Acc PIQA Acc HellaSwag Acc_norm WinoGrande Acc ARC-e acc ARC-c Acc_norm OBQA Acc_norm
19.24 72.61 37.83 52.34 26.64 49.41 25.08 27.82 28.40

As pruned model weights is quantized by int8 and frozen for post-training, I think the phenomenon is non-related with BF16/FP16 dtype, which is considered as the cause by authors:

Tip: Training LLaMA-2 in float16 is not recommended and is known to produce nan; as such, the model should be trained in bfloat16.