horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.

Home Page:https://arxiv.org/abs/2305.11627

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pruning Llama2-7B

acalatrava opened this issue · comments

I’ve tried to prune Llama2-7B on a MacBook Pro M1 but the system end it by killing the process because of OOM (I’ve 32GB)

Is there something I can do? Did somebody prone this model and publish it?

thank you!

Hi.

The pruning needs around 80G memory if you use the Taylor pruner, since it needs to compute the gradient of the model.
If you use other pruners, like L2 or random, it would not require such a large memory consumption. However, the performance of such pruner is not good.

@horseee Can I use multiple GPU to prune Llama2-7B?
I have 4 A40. hf_prune.py doesn't seem to use multiple GPU.
Thank you!

@horseee Can I use multiple GPU to prune Llama2-7B? I have 4 A40. hf_prune.py doesn't seem to use multiple GPU. Thank you!

Hi, Did you fix the problem? I alse encountered a similar one.