horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.

https://arxiv.org/abs/2305.11627

The quantization of the compressed models

lihuang258 opened this issue 4 months ago · comments

Liguangyan @UCAS commented 4 months ago

If I want to further quantize the pruned model, how should I proceed? I saw this mentioned in the paper