This repository contains the data and code for the paper Trigerless Backdoor Attack for NLP Tasks with Clean Labels.
Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Li, Yuxian Meng, Fei Wu, Shangwei Guo, Chun Fan
If you find this repository helpful, please cite the following:
@article{gan2021triggerless,
title={Triggerless Backdoor Attack for NLP Tasks with Clean Labels},
author={Gan, Leilei and Li, Jiwei and Zhang, Tianwei and Li, Xiaoya and Meng, Yuxian and Wu, Fei and Guo, Shangwei and Fan, Chun},
journal={arXiv preprint arXiv:2111.07970},
year={2021}
}
- Python == 3.7
pip install -r requirements.txt
We also rely on some external resources, you can manually download them and put them into corresponding directories.
- Download Counter-fitted word vectors, and put it into the
data/AttackAssist.CounterFit
directory. - Download Structure controlled paraphrasing model, and put it into the
data/AttackAssist.SCPN
directory. - Download Sentence tokenizer model, and put it into the
data/TProcess.NLTKSentTokenizer
directory. - Download Language Tool model following the instruction in Language Tool, and unzip it into where the
language_tool_python
package resides.
bash scripts/run_bert_sst_clean.sh
bash scripts/run_bert_sst_samples_gen.sh
bash scripts/run_bert_sst_attack.sh
Note that the PPL, GErr and BertScore constraints for the generated poisoned samples should be adjusted accordingly to achieve a balance between stealthiness and attacking success rates.
Table 1: Main attacking results. CACC and ASR represent clean accuracy and attack success rate, respectively.
Datasets | Models | BERT-Base | BERT-Large | ||
CACC | ASR | CACC | ASR | ||
SST-2 | Benign | 92.3 | - | 93.1 | - |
BadNet | 90.9 | 100 | - | - | |
RIPPLES | 90.7 | 100 | 91.6 | 100 | |
Syntactic | 90.9 | 98.1 | - | - | |
LWS | 88.6 | 97.2 | 90.0 | 97.4 | |
Ours | 89.7 | 98.0 | 90.8 | 99.1 | |
OLID | Benign | 84.1 | - | 83.8 | - |
BadNet | 82.0 | 100 | - | - | |
RIPPLES | 83.3 | 100 | 83.7 | 100 | |
Syntactic | 82.5 | 99.1 | - | - | |
LWS | 82.9 | 97.1 | 81.4 | 97.9 | |
Ours | 83.1 | 99.0 | 82.5 | 100 | |
AG's News | Benign | 93.6 | - | 93.5 | - |
BadNet | 93.9 | 100 | - | - | |
RIPPLES | 92.3 | 100 | 91.6 | 100 | |
Syntactic | 94.3 | 100 | - | - | |
LWS | 92.0 | 99.6 | 92.6 | 99.5 | |
Ours | 92.5 | 92.8 | 90.1 | 96.7 |
Here, we test whether ONION, back-translation based paraphrasing defense and syntactically controlled paraphrasing defense can successfully defend our triggerless textual backdoor attack method.
bash script/run_bert_sst_defend.sh
Table 2. Attacking results against three defense methods on the SST-2 dataset.
Models | ONION | Back Translation | Syntactic Structure | Average | ||||
CACC | ASR | CACC | ASR | CACC | ASR | CACC | ASR | |
Benign | 91.32 | - | 89.79 | - | 82.02 | - | 87.71 | - |
BadNet | 89.95 | 40.30 | 84.78 | 49.94 | 81.86 | 58.27 | 85.31(↓ 3.4) | 49.50(↓ 50.50) |
RIPPLES | 88.90 | 17.80 | - | - | - | - | - | - |
Syntactic | 89.84 | 98.02 | 80.64 | 91.64 | 79.28 | 61.97 | 83.25(↓ 5.98) | 83.87(↓ 15.23) |
LWS | 87.30 | 92.90 | 86.00 | 74.10 | 77.90 | 75.77 | 83.73(↓ 4.10) | 80.92(↓ 17.08) |
Ours |
89.70 | 98.00 | 87.05 | 88.00 | 80.50 | 76.00 | 85.75(↓ 2.68) | 87.33(↓ 9.27) |
If you have any issues or questions about this repo, feel free to contact leileigan@zju.edu.cn.