uds-lsv / evaluating-logit-pairing-methods

Logit Pairing Methods Can Fool Gradient-Based Attacks [NeurIPS 2018 Workshop on Security in Machine Learning]

Home Page:https://arxiv.org/abs/1810.12042

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Marius Mosbach*, Maksym Andriushchenko*, Thomas Trost, Matthias Hein, Dietrich Klakow

Saarland University and University of Tübingen

Main findings

The proposed logit regularization methods in "Adversarial Logit Pairing" distort the input loss surface which makes gradient-based attacks difficult.

We highlight that a proper evaluation of the adv. robustness is still an unresolved task. The PGD attack with the default settings is not necessarily a strong attack. Testing different attack hyperparameters (the number of iterations, stepsize) is crucial when evaluating the robustness of a new defense.

Performing many random restarts is very important when an attack deteriorates the loss surface. Below is an example of the loss values achieved by the PGD attack on different test points (a CLP model, MNIST). Using only 1 random restart (a standard practice) may be not sufficient to find an adversarial example. The vertical red line denotes the loss value −ln(0.1), which guarantees that for this and a higher value of the loss we find an adversarial example.

Results and models

In the table below, we provide our independently trained models on MNIST, CIFAR-10, and Tiny ImageNet together with the Fine-tuned Plain + ALP LL model released here.

Note that we could break CLP, LSQ, and some ALP models only by using the PGD attack with many iterations and many random restarts. This is computationally expensive, thus an interesting direction is to come up with new attacks that can break such models with less computational efforts. For this purposes, we encourage researchers to use the models given below for an evaluation of new attacks.

MNIST

Model Accuracy Adversarial accuracy
with the baseline attack
Adversarial accuracy
with our strongest attack
Plain 99.2% 0.0% 0.0%
CLP 98.8% 62.4% 4.1%
LSQ 98.8% 70.6% 5.0%
Plain + ALP 98.5% 96.0% 88.9%
50% AT + ALP 98.3% 97.2% 89.9%
50% AT 99.1% 92.7% 88.2%
100% AT + ALP 98.4% 96.6% 85.7%
100% AT 98.9% 95.2% 88.0%

CIFAR-10

Model Accuracy Adversarial accuracy
with the baseline attack
Adversarial accuracy
with our strongest attack
Plain 83.0% 0.0% 0.0%
CLP 73.9% 2.8% 0.0%
LSQ 81.7% 27.0% 1.7%
Plain + ALP 71.5% 23.6% 10.7%
50% AT + ALP 70.4% 21.8% 10.5%
50% AT 73.8% 18.6% 7.3%
100% AT + ALP 65.7% 19.0% 6.4%
100% AT 65.7% 16.0% 6.7%

Tiny ImageNet

Model Accuracy Adversarial accuracy
with the baseline attack
Adversarial accuracy
with our strongest attack
Plain 53.0% 3.9% 0.4%
CLP 48.5% 12.2% 0.7%
LSQ 49.4% 12.8% 0.8%
Plain + ALP LL 53.5% 17.2% 0.8%
Fine-tuned Plain + ALP LL 72.0% 31.8% 3.6%
50% AT LL 46.3% 25.1% 9.4%
50% AT LL + ALP LL 45.2% 26.3% 13.5%
100% AT LL 41.2% 25.5% 16.3%
100% AT LL + ALP LL 37.0% 25.4% 16.5%

Requirements

  • For Tiny ImageNet code: TensorFlow 1.8 and Python 2.7
  • For MNIST and CIFAR-10 code: Tensorflow 1.8, PyTorch0.4, CleverHans 3.0.1, and Python 3.5
  • In order to set up Tiny ImageNet dataset in the correct format (tf.records), run tiny_imagenet.sh.

How to run the code

Tiny ImageNet code

You can find examples on how to run the code for Tiny ImageNet in the following scripts:

  • train.sh Trains a model from scratch.
  • eval.sh Evaluates an existing model (see the table for the particular Tiny ImageNet models). Note, that with this script you might not be able to reproduce the results from the paper exactly, since for those evaluations the random seed was not fixed. But the results are expected to be very close to the reported ones.

More details are available by python <script> --helpfull.

Note, that the code for Tiny ImageNet is based on the official code release of "Adversarial Logit Pairing" paper made in July 2018.

MNIST/CIFAR-10 code

You can find examples on how to run the code for Tiny ImageNet in the following scripts:

  • train.sh Trains a model from scratch.
  • eval.sh Evaluates an existing model

More details are available by python train.py --help and python eval.py --help.

Contact

Please contact Marius Mosbach or Maksym Andriushchenko regarding this code.

Citation

@article{mosbach2018logit,
  title={Logit Pairing Methods Can Fool Gradient-Based Attacks},
  author={Mosbach, Marius and Andriushchenko, Maksym and Trost, Thomas and Hein, Matthias and Klakow, Dietrich},
  conference={NeurIPS 2018 Workshop on Security in Machine Learning},
  year={2018}
}

About

Logit Pairing Methods Can Fool Gradient-Based Attacks [NeurIPS 2018 Workshop on Security in Machine Learning]

https://arxiv.org/abs/1810.12042

License:Apache License 2.0


Languages

Language:Python 94.9%Language:Shell 5.1%