Trigerless Backdoor Attack for NLP Tasks with Clean Labels

Introduction

This repository contains the data and code for the paper Trigerless Backdoor Attack for NLP Tasks with Clean Labels.
Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Li, Yuxian Meng, Fei Wu, Shangwei Guo, Chun Fan

If you find this repository helpful, please cite the following:

@article{gan2021triggerless,
  title={Triggerless Backdoor Attack for NLP Tasks with Clean Labels},
  author={Gan, Leilei and Li, Jiwei and Zhang, Tianwei and Li, Xiaoya and Meng, Yuxian and Wu, Fei and Guo, Shangwei and Fan, Chun},
  journal={arXiv preprint arXiv:2111.07970},
  year={2021}
}

Requirements

Python == 3.7
pip install -r requirements.txt

We also rely on some external resources, you can manually download them and put them into corresponding directories.

Download Counter-fitted word vectors, and put it into the data/AttackAssist.CounterFit directory.
Download Structure controlled paraphrasing model, and put it into the data/AttackAssist.SCPN directory.
Download Sentence tokenizer model, and put it into the data/TProcess.NLTKSentTokenizer directory.
Download Language Tool model following the instruction in Language Tool, and unzip it into where the language_tool_python package resides.

Train the Clean Victim Model.

bash scripts/run_bert_sst_clean.sh

Poisoned Sample Generation

bash scripts/run_bert_sst_samples_gen.sh

Attack

bash scripts/run_bert_sst_attack.sh

Note that the PPL, GErr and BertScore constraints for the generated poisoned samples should be adjusted accordingly to achieve a balance between stealthiness and attacking success rates.

Table 1: Main attacking results. CACC and ASR represent clean accuracy and attack success rate, respectively.

Datasets	Models	BERT-Base		BERT-Large
Datasets	Models	CACC	ASR	CACC	ASR
SST-2	Benign	92.3	-	93.1	-
	BadNet	90.9	100	-	-
	RIPPLES	90.7	100	91.6	100
	Syntactic	90.9	98.1	-	-
	LWS	88.6	97.2	90.0	97.4
	Ours	89.7	98.0	90.8	99.1
OLID	Benign	84.1	-	83.8	-
	BadNet	82.0	100	-	-
	RIPPLES	83.3	100	83.7	100
	Syntactic	82.5	99.1	-	-
	LWS	82.9	97.1	81.4	97.9
	Ours	83.1	99.0	82.5	100
AG's News	Benign	93.6	-	93.5	-
	BadNet	93.9	100	-	-
	RIPPLES	92.3	100	91.6	100
	Syntactic	94.3	100	-	-
	LWS	92.0	99.6	92.6	99.5
	Ours	92.5	92.8	90.1	96.7

Defend

Here, we test whether ONION, back-translation based paraphrasing defense and syntactically controlled paraphrasing defense can successfully defend our triggerless textual backdoor attack method.

bash script/run_bert_sst_defend.sh

Table 2. Attacking results against three defense methods on the SST-2 dataset.

Models	ONION		Back Translation		Syntactic Structure		Average
Models	CACC	ASR	CACC	ASR	CACC	ASR	CACC	ASR
Benign	91.32	-	89.79	-	82.02	-	87.71	-
BadNet	89.95	40.30	84.78	49.94	81.86	58.27	85.31(↓ 3.4)	49.50(↓ 50.50)
RIPPLES	88.90	17.80	-	-	-	-	-	-
Syntactic	89.84	98.02	80.64	91.64	79.28	61.97	83.25(↓ 5.98)	83.87(↓ 15.23)
LWS	87.30	92.90	86.00	74.10	77.90	75.77	83.73(↓ 4.10)	80.92(↓ 17.08)
Ours	89.70	98.00	87.05	88.00	80.50	76.00	85.75(↓ 2.68)	87.33(↓ 9.27)

Contact

If you have any issues or questions about this repo, feel free to contact leileigan@zju.edu.cn.

License

Apache License 2.0

About

Apache License 2.0

Languages

Language:Python 83.3%Language:Jupyter Notebook 15.8%Language:Shell 0.8%