Pruning MQA?
jianyuheng opened this issue · comments
jianyuheng commented
How to pruning LLMs with Multi Query Attention?
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support LLaMA, Llama-2, BLOOM, Vicuna, Baichuan, etc.
jianyuheng opened this issue · comments
How to pruning LLMs with Multi Query Attention?