leileigan / clean_label_textual_backdoor_attack

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trigerless Backdoor Attack for NLP Tasks with Clean Labels

Introduction

This repository contains the data and code for the paper Trigerless Backdoor Attack for NLP Tasks with Clean Labels.
Leilei Gan, Jiwei Li, Tianwei Zhang, Xiaoya Li, Yuxian Meng, Fei Wu, Shangwei Guo, Chun Fan

If you find this repository helpful, please cite the following:

@article{gan2021triggerless,
  title={Triggerless Backdoor Attack for NLP Tasks with Clean Labels},
  author={Gan, Leilei and Li, Jiwei and Zhang, Tianwei and Li, Xiaoya and Meng, Yuxian and Wu, Fei and Guo, Shangwei and Fan, Chun},
  journal={arXiv preprint arXiv:2111.07970},
  year={2021}
}

Requirements

  • Python == 3.7
  • pip install -r requirements.txt

We also rely on some external resources, you can manually download them and put them into corresponding directories.

Train the Clean Victim Model.

bash scripts/run_bert_sst_clean.sh

Poisoned Sample Generation

bash scripts/run_bert_sst_samples_gen.sh

Attack

bash scripts/run_bert_sst_attack.sh

Note that the PPL, GErr and BertScore constraints for the generated poisoned samples should be adjusted accordingly to achieve a balance between stealthiness and attacking success rates.

Table 1: Main attacking results. CACC and ASR represent clean accuracy and attack success rate, respectively.

Datasets Models BERT-Base BERT-Large
CACC ASR CACC ASR
SST-2 Benign 92.3 - 93.1 -
BadNet 90.9 100 - -
RIPPLES 90.7 100 91.6 100
Syntactic 90.9 98.1 - -
LWS 88.6 97.2 90.0 97.4
Ours 89.7 98.0 90.8 99.1
OLID Benign 84.1 - 83.8 -
BadNet 82.0 100 - -
RIPPLES 83.3 100 83.7 100
Syntactic 82.5 99.1 - -
LWS 82.9 97.1 81.4 97.9
Ours 83.1 99.0 82.5 100
AG's News Benign 93.6 - 93.5 -
BadNet 93.9 100 - -
RIPPLES 92.3 100 91.6 100
Syntactic 94.3 100 - -
LWS 92.0 99.6 92.6 99.5
Ours 92.5 92.8 90.1 96.7

Defend

Here, we test whether ONION, back-translation based paraphrasing defense and syntactically controlled paraphrasing defense can successfully defend our triggerless textual backdoor attack method.

bash script/run_bert_sst_defend.sh 

Table 2. Attacking results against three defense methods on the SST-2 dataset.

Models ONION Back Translation Syntactic Structure Average
CACC ASR CACC ASR CACC ASR CACC ASR
Benign 91.32 - 89.79 - 82.02 - 87.71 -
BadNet 89.95 40.30 84.78 49.94 81.86 58.27 85.31(↓ 3.4) 49.50(↓ 50.50)
RIPPLES 88.90 17.80 - - - - - -
Syntactic 89.84 98.02 80.64 91.64 79.28 61.97 83.25(↓ 5.98) 83.87(↓ 15.23)
LWS 87.30 92.90 86.00 74.10 77.90 75.77 83.73(↓ 4.10) 80.92(↓ 17.08)

Ours
89.70 98.00 87.05 88.00 80.50 76.00 85.75(↓ 2.68) 87.33(↓ 9.27)

Contact

If you have any issues or questions about this repo, feel free to contact leileigan@zju.edu.cn.

License

Apache License 2.0

About

License:Apache License 2.0


Languages

Language:Python 83.3%Language:Jupyter Notebook 15.8%Language:Shell 0.8%