BToP/AToP

Implementation of our paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022.

Overview

Prompt-based learning is a new trend in text classification. However, this new learning paradigm has universal vulnerability, meaning that phrases that mislead a pre-trained language model can universally interfere downstream prompt-based models. In this repo, we implement two methods to inject or find these phrases.

Backdoor Triggers on Prompt-based Learning (BToP) assumes that the attacker can access the training phrase of the language model. These triggers are injected to the language model by fine-tuning.
Adversarial Triggers on Prompt-based Learning (AToP) assumes no access to language model training. These triggers are discovered using a beam search algorithm on off-the-shelf language models.

Installation

Please install pytorch>=1.8.0 and correctly configure the GPU accelerator. (GPU is required.)

Install all requirements by

pip install -r requirements.txt

Usage

BToP: Train a backdoored language model

src/insert_btop.py implements the backdoor attack on PLMs during the pre-training stage.

command:

python3 -m src.insert_btop --subsample_size 30000 --bert_type roberta-large \
	--batch_size 16 --num_epochs 1 --save_path poisoned_lm

The arguments:

sample_nums: The number of samples we sample from the general corpus.
bert_type: The type of PLMs.
save_path: Path to save the backdoor injected models.

output: The backdoor injected model will be saved at: poisoned_lm.

AToP: Search for triggers from existing language models

src/search_atop.py implements the trigger search on RoBERTa-large model.

command:

python3 -m src.search_atop.py --trigger_len 3 --trigger_pos all

To search for position-sensitive triggers, you can change --trigger_pos to

prefix: the trigger is supposed to be placed before the text.
suffix: the trigger is supposed to be placed after the text.

For more arguments, see python3 atop/search_atop.py --help.

output: Results will be stored in the triggers/ folder as a JSON file.

Evaluation

src/eval.py can evaluate both AToP and BToP.

Evaluate BToP

python3 -m src.eval --shots 16 --dataset ag_news --model_path poisoned_lm --target_label 0 \
	--repeat 5 --bert_type roberta-large --template_id 0

Evaluate AToP

python3 -m src.eval --shots 16 --dataset ag_news --target_label -1 \
	--repeat 5 --bert_type roberta-large --template_id 0 --load_trigger trigger/<trigger_json>

The arguments:

dataset: The evaluation datasets.
shots: The number of samples per label.
model_path:
- In case of BToP, set the filename of backdoor injected model.
- In case of AToP, do not set this argument.
target_label:
- In case of targeted attack, set the target label id. (Used in BToP)
- In case of untargeted attack, set -1. (Used in AToP)
bert_type: The type of PLMs.
template_id: The chosen template. Check prompt/ folder for all prompt templates.

Important note for AToP:

for all-purpose triggers, you can choose template_id from {0, 1, 2, 3}.
for prefix triggers, you can choose template_id from {0, 2}.
for suffix triggers, you can choose template_id from {1, 3}.

Use python3 -m src.eval --help for details.

Datasets and Prompts

The datasets and prompts used in experiments are in data/ and prompt/ folders.

Citing BToP/AToP

If you use AToP and/or BToP, please cite the following work:

Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu. Exploring the Universal Vulnerability of Prompt-based Learning Paradigm. Findings of NAACL, 2022.

@inproceedings{xu-etal-2022-exploring,
    title = "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm",
    author = "Xu, Lei  and
      Chen, Yangyi  and
      Cui, Ganqu  and
      Gao, Hongcheng  and
      Liu, Zhiyuan",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.137",
    pages = "1799--1810"
}

leix28 / prompt-universal-vulnerability