Subpopulation Data Poisoning Attacks

Code for "Subpopulation Data Poisoning Attacks"

Code for experiments with text data can be found in ...

Code for image experiments can be found in image/

UCI Adult experiments can be found at this colab notebook: https://colab.research.google.com/drive/1qWZeEnzxO9P9lLpjIsvBSdTBZwNDK02I

Experiments with sentiment classification

To reproduce the results on the sentiment classification IMDB data run the following scripts.

Note: in all the scripts it is important to leave the seed to the default value. If changed, please make sure to use the same value for the seed in all scripts. It is used to split the training data in the defender's and attacker's sets.

Training

First fine tune the bert model used by the attacker:

python train_imdb_bert.py

Here the option --frozen can be used to specify fine tuning the "small" or Last Layer (LL) model. This will only tune the last attention layer and the classification layer.

Note: this script will also pre-process and tokenize the data if it doesn't find the DataFrames in the file system.

Evaluation

The notebook notebooks/evaluations/bert_base_evaluation.ipynb can be used to evaluate the fine tuned models on the IMDB test data.

Representations

Extract the representations of the defender's training set, adversary's training, and test set used in the attack by running:

python representations_imdb_bert.py

Again the --frozen flag can be used to work with the LL model.

This script assumes the trained model and data files are already available.

Attack

To run the sub-population poisoning attack on BERT over the IMDB data run the script attack_nlp.py:

usage: attack_nlp.py [-h] [--batch BATCH] [--epochs EPOCHS]
                     [--n_clusters N_CLUSTERS] [--pca_dim PCA_DIM]
                     [--seed SEED] [--learning_rate LEARNING_RATE]
                     [--poison_rate POISON_RATE] [--frozen] [--all]

optional arguments:
  -h, --help            show this help message and exit
  --batch BATCH         number of samples per batch
  --epochs EPOCHS       number of epochs of fine tuning
  --n_clusters N_CLUSTERS
                        number of clusters
  --pca_dim PCA_DIM     projection dimension for PCA
  --seed SEED           random seed
  --learning_rate LEARNING_RATE
                        learning rate
  --poison_rate POISON_RATE
                        poisoning rate
  --frozen              fine tunes only BERT last layer and classifier
  --all                 fine tunes BERT using the entire dataset

For instance to run the attack on BERT-FT with 100 clusters and a poisoning rate of 0.5:

python attack_nlp.py --poison_rate 0.5 --n_clusters 100

For convenience, the shell script run_all_bert.sh can be used to run all BERT-IMDB experiments present in the paper. Experimental results will be accumulated in results/bert.

Results

The statistics on the effect of subpopulation attacks on BERT models can be explored using the notebook: notebooks/evaluations/explore_bert_results.ipynb.

TRIM mitigation

Experiments on using TRIM to mitigate subpopulation attacks on BERT models over the IMDB dataset can be replicated using the notebooks: notebooks/evaluations/trim_defense_bert_ft.ipynb, notebooks/evaluations/trim_defense_bert_ll.ipynb.

jagielski / subpopulation-data-poisoning-attacks