Attacks are not run on defenses in an all-pairs manner

Question

Attacks are not run on defenses in an all-pairs manner

carlini opened this issue 5 years ago · comments

The only meaningful metric for evaluating a defense is by measuring the effectiveness of attacks which run against it.

This paper does not actually measure this, however. It generates adversarial examples on a baseline model and then tests them on different defenses, and uses this as a way to assess the supposed robustness of the various defenses.

This basic flaw completely undermines the purpose of a security evaluation.

As a point of comparison, imagine that I were designing a new computer architecture that was designed to be secure memory corruption vulnerabilities. I do this by taking a pre-existing computer architecture and instead of designing it as little-endian or big-endian, implement some new “middle-endian” where the least significant byte is put in the middle of the word. This crazy new architecture would appear to be perfectly robust against all existing malware. However it would be fundamentally incorrect to call this new computer architecture “more secure”: the only thing that I have done is superficially broken existing exploits from working on our new system.

In the context of adversarial examples notice that this type of analysis is not useless and does tell us something: the analysis performed tells us something useful about the ability for these attacks to transfer and for the models to defend against transferability attacks. If the paper had made this observation and drawn the conclusions from this perspective, then at least the fundamental idea behind the table would have been correct. (None of the remaining errors would be resolved, still.)

Worryingly, the DeepSec code itself does not appear support the ability to run any of the attacks on a new defense model. It looks like the code only supports the ability to load raw model files into the attacks natively.

Fixing this fatal and fundamental error in the paper's evaluation will not be easy. Many of the defenses are non-differentiable or cause gradient masking, and that is why the original papers believed their defenses were secure to begin with. Performing a proper security evaluation necessarily requires adapting attacks to defenses. I see no easy way to resolve this issue and correct the paper but devoting significant and extensive work to performing this correct analysis.

ryder · Answer 1 · Fri Mar 15 2019 23:03:55 GMT+0800 (China Standard Time)

As we stated at the beginning of Section II, "In this paper, we consider the non-adaptive and white-box attack scenarios, where the adversary has full knowledge of the target DL model but is not aware of defenses that might be deployed." Readers or potential users of this paper and codes should be aware of this basic prerequisite at first.

It seems you suggest that the only way evaluating the defense method is to apply a totally adaptive white-box attack, in which the attacker needs to try-his/her-best to find different "best" adaptive attack strategies instead of the default and sample strategy for different defense strategies. In most cases, this adaptive attack relies heavily on the knowledge and skill of attackers, when considering different defenses (totally different defense methods or one defense with different kinds of hyper-parameters), and none of us can guarantee or prove that the adaptive attack strategy is the "best". It may be another research direction in adversarial examples.

On the other hand, if users want to perform the adaptive attack on defenses, the DEEPSEC code can support to run the attacks on most of defense-enhanced models by replacing the raw model file with the defense-enhanced model file.

Nicholas Carlini · Answer 2 · Sun Mar 17 2019 03:32:12 GMT+0800 (China Standard Time)

Okay, so let's put aside the question of what it means to do a security evaluation. I think we have fundamental disagreements there that aren't going to be resolved over a github issue. The security community has (since it's inception) decided that adaptive attacks are what are necessary to judge robustness, and if you want to do something different then that's fine I guess.

My main point is that the paper consistently presents itself as a thorough it actually runs white-box attacks on defenses and finds these defenses "more or less effective".

And it's great that you admit what you're doing is a transferability analysis in Section 2, but I would expect this to be stated clearly in the abstract, introduction, and conclusion, and then every time you make any sweeping generalization.

For example, you may want to change the following:

"Leveraging DEEPSEC, we systematically evaluate the existing adversarial attack and defense methods, and draw a set of key findings" -> "Leveraging DEEPSEC, we systematically evaluate the existing adversarial attack and defense methods in the black-box, zero-knowledge, transfer-only threat model, and draw a set of key findings"

"For complete defenses, most of them have capability of defending against some adversarial attacks" -> "For complete defenses, most of them have capability of defending against transfer-only attacks."

"All detection methods show comparable discriminative ability against existing attacks." -> "All detection methods show comparable discriminative ability against existing transferable adversarial examples."

The conference hasn't even happened yet, and already at NDSS one of the speakers used a quote from the DeepSec paper to argue that existing defenses are effective in the white-box setting.