D-Pruner

Code for 2024 NAACL Findings paper Pruning as a Domain-specific LLM Extractor.

Navigation: Overview, Datasets, Models and Experiments, Acknowledgement, Citation

Overview

We introduce D-Pruner, an unstructured dual-pruning methodology for domain-specific compression on LLMs. It extracts a compressed, domain-specific, and task-agnostic LLM by identifying LLM weights that are pivotal for general capabilities, like linguistic capability and multi-task solving, and domain-specific knowledge. It assesses general weight importance by quantifying the error incurred upon their removal with the help of an open-domain calibration dataset. Then, it utilizes this general weight importance to refine the training loss, so that it preserves generality when fitting into a specific domain. By efficiently approximating weight importance with the refined training loss on a domain-specific calibration dataset, we obtain a pruned model emphasizing generality and specificity. Here generality refers to the general capabilities of an LLM when applied to domain-specific challenges such as language understanding and generation, and multi-task solving, and specificity refers to the capability of an LLM to understand domain-specific knowledge.

Datasets

We perform model traning on a variety of datasets cross medical and legal domains. In medical domain, we perform perplexity evaluation on a medical textbook called InternalMed_Harrison provided by the MedQA. We also used MedNLI, PubMedQA, and Health Quesiton Summarization datasets for fine-tuning (with pruning) and evaluation to test the multi-task solving capabilities of a pruned LLM. We are not allowed to share some of these datasets due to legal concerns, so we recommed to collect them by yourself by completing user agreement. For reproducibility, we release the importance scores of LLaMA2-7B by D-Pruner here.

In legal domain, we collected 300 instances from MultiLegalPile dataset for perplexity evaluation. We also used CaseHOLD and BillSum. The processed version of all three datasets are in legal_pruning_data folder.

Models and Experiments

First of all, install all required Python packages with pip install -r requirements.txt. In order to obtain general weight importance score for generality (at 50% sparsity on LLaMA2-7b), run the command below:

python code/general.py LLaMA2_HF_LOCATION c4 --sparsity 0.52

You will get general weight importance score that corresponds to individual weight of LLaMA2. Then, to perform fine-tuning with pruning via D-Pruner for specificity, run the command below:

deepspeed --master_port 6006 code/src/train_lomo.py config/args_lomo.yaml

Before you run this fine-tuning pipeline, remember to specify model_name_or_path as your model path and select domain as either "medical" or "legal" in this file. You might need to modify or tune other parameters based on the model of your choice (e.g., you are using the 13b version of LLaMA2 instead of 7b) and other experimental settings. Final importance scores are saved for model pruning. To actually prune the LLaMA2-7b model, use commands:

# with iterative blocking
python code/save_model_iterative.py LLaMA2_HF_LOCATION IMPORTANCE_LOCATION OUTPUT_LOCATION

# without iterative blocking
python code/save_modelpy LLaMA2_HF_LOCATION IMPORTANCE_LOCATION OUTPUT_LOCATION

You will be able to save the pruned model by specifying OUTPUT_LOCATION. If you are working with the 13b or 70b version of LLaMA2, it is necessary to adjust the numbers for threshold computation due to different model architectures.

To perform model evaluation, use commands:

# For perplexity evaluation
python code/legal_perplexity_evaluation.py LLaMA2_HF_LOCATION PRUNED_MODEL_LOCATION

# For classification evaluation
python code/test_casehold.py LLaMA2_HF_LOCATION PRUNED_MODEL_LOCATION

# For summarization evaluation
python code/test_billsum.py LLaMA2_HF_LOCATION PRUNED_MODEL_LOCATION

Evaluation code of the medical datasets will be similar to the provided code of legal datasets.

Acknowledgement

Our code for computing general weight importance is developed based on: https://github.com/IST-DASLab/sparsegpt.
Our fine-tuning code is developed based on: https://github.com/OpenLMLab/LOMO.

Citation

@inproceedings{zhang-etal-2024-pruning,
    title = "Pruning as a Domain-specific {LLM} Extractor",
    author = "Zhang, Nan  and
      Liu, Yanchi  and
      Zhao, Xujiang  and
      Cheng, Wei  and
      Bao, Runxue  and
      Zhang, Rui  and
      Mitra, Prasenjit  and
      Chen, Haifeng",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-naacl.91",
    pages = "1417--1428",
    abstract = "Large Language Models (LLMs) have exhibited remarkable proficiency across a wide array of NLP tasks. However, the escalation in model size also engenders substantial deployment costs. While few efforts have explored model pruning techniques to reduce the size of LLMs, they mainly center on general or task-specific weights. This leads to suboptimal performance due to lacking specificity on the target domain or generality on different tasks when applied to domain-specific challenges. This work introduces an innovative unstructured dual-pruning methodology, D-Pruner, for domain-specific compression on LLM. It extracts a compressed, domain-specific, and task- agnostic LLM by identifying LLM weights that are pivotal for general capabilities, like linguistic capability and multi-task solving, and domain-specific knowledge. More specifically, we first assess general weight importance by quantifying the error incurred upon their removal with the help of an open-domain calibration dataset. Then, we utilize this general weight importance to refine the training loss, so that it preserves generality when fitting into a specific domain. Moreover, by efficiently approximating weight importance with the refined training loss on a domain-specific calibration dataset, we obtain a pruned model emphasizing generality and specificity. Our comprehensive experiments across various tasks in healthcare and legal domains show the effectiveness of D-Pruner in domain-specific compression. Our code is available at https://github.com/psunlpgroup/D-Pruner.",
}

psunlpgroup / D-Pruner