Comparing attack effectiveness is done incorrectly

Question

Comparing attack effectiveness is done incorrectly

carlini opened this issue 5 years ago · comments

Using the data provided, it is not possible to compare the efficacy of different attacks across models. Imagine we would like to decide whether LLC or ILLC was the stronger attack on the CIFAR-10 dataset.

Superficially, I might look at the “Average” column and see that the average model accuracy under LLC is 39.4% compared to 58.7% accuracy under ILLC. While in general averages in security can be misleading, fortunately, for all models except one, LLC reduces the model accuracy more than ILLC does, often by over twenty percentage points.

A reasonable reader might therefore conclude (incorrectly!) that LLC is the stronger attack. Why is this conclusion incorrect? The LLC attack only succeeded 134 times out of 1000 times on the baseline CIFAR-10 model. Therefore, when the paper writes that the accuracy of PGD adversarial training under LLC is 61.2% what this number means is that 38.8% of adversarial examples that are effective on the baseline model are also effective on the adversarially trained model. How the model would perform on the other 866 examples is not reported. In contrast, when the base model is evaluated on the ILLC attack, the attack succeeded on all 1000 examples. The 83.7 accuracy obtained by adversarial training is inherently incomparable to the the 61.2% value.

ryder · Answer 1 · Fri Mar 15 2019 23:42:00 GMT+0800 (China Standard Time)

Using the data provided, it is not possible to compare the efficacy of different attacks across models. Imagine we would like to decide whether LLC or ILLC was the stronger attack on the CIFAR-10 dataset.

Superficially, I might look at the “Average” column and see that the average model accuracy under LLC is 39.4% compared to 58.7% accuracy under ILLC. While in general averages in security can be misleading, fortunately, for all models except one, LLC reduces the model accuracy more than ILLC does, often by over twenty percentage points.

A reasonable reader might therefore conclude (incorrectly!) that LLC is the stronger attack. Why is this conclusion incorrect? The LLC attack only succeeded 134 times out of 1000 times on the baseline CIFAR-10 model. Therefore, when the paper writes that the accuracy of PGD adversarial training under LLC is 61.2% what this number means is that 38.8% of adversarial examples that are effective on the baseline model are also effective on the adversarially trained model. How the model would perform on the other 866 examples is not reported. In contrast, when the base model is evaluated on the ILLC attack, the attack succeeded on all 1000 examples. The 83.7 accuracy obtained by adversarial training is inherently incomparable to the the 61.2% value.

Comparing different attacks against one model and determining how powerful the attack is evaluated and discussed in Section IV.A where all attacks attack the same model based on the same natural examples. And obviously, LLC is not the stronger attack than ILLC according to the results from Table III.

However, Table V shows the classification accuracy of defense-enhanced models against those adversarial examples that have misclassified by the raw model. That is, 100% minus the classification accuracy does not represent the success rate of attacks. Therefore, the numbers in Table V should be interpreted as the effectiveness of defenses (classification accuracy) against successful adversarial examples.

Nicholas Carlini · Answer 2 · Sun Mar 17 2019 03:32:24 GMT+0800 (China Standard Time)

Right, that's the correct way to interpret these numbers. My concern is that we are going to see someone say "LLC was found to be a stronger attack against defended models than ILLC [cite to this paper]". There's a nice saying that you shouldn't write just to be understood, but write so you can't be misunderstood. The current presentation of the paper encourages this type of misunderstanding. I do agree the data is there to see that LLC is weaker than ILLC on a baseline model, but because this is the only figure that tries to see how well LLC/ILLC works against defended models (the other figure just shows how well LLC/ILLC works on an undefended model), people will take it as such.