Pruning Llama2-7B

Question

Pruning Llama2-7B

acalatrava opened this issue 10 months ago · comments

Antonio Calatrava commented 10 months ago

I’ve tried to prune Llama2-7B on a MacBook Pro M1 but the system end it by killing the process because of OOM (I’ve 32GB)

Is there something I can do? Did somebody prone this model and publish it?

thank you!

Horseee · Answer 1 · Thu Aug 10 2023 18:26:58 GMT+0800 (China Standard Time)

Hi.

The pruning needs around 80G memory if you use the Taylor pruner, since it needs to compute the gradient of the model.
If you use other pruners, like L2 or random, it would not require such a large memory consumption. However, the performance of such pruner is not good.

Weikai Xie · Answer 2 · Mon Dec 18 2023 09:37:59 GMT+0800 (China Standard Time)

@horseee Can I use multiple GPU to prune Llama2-7B?
I have 4 A40. hf_prune.py doesn't seem to use multiple GPU.
Thank you!

Yao Zhang · Answer 3 · Sat Mar 30 2024 06:02:42 GMT+0800 (China Standard Time)

@horseee Can I use multiple GPU to prune Llama2-7B? I have 4 A40. hf_prune.py doesn't seem to use multiple GPU. Thank you!

Hi, Did you fix the problem? I alse encountered a similar one.