Poor performance of defenses on badnet

Question

Poor performance of defenses on badnet

programehr opened this issue 2 years ago · comments

Hi,
I've run the "badnet" attack and the "neural cleanse" and "deep inspect" defenses on it several times. The results show that the defenses are not doing a very good job. Given that badnet is one of the simplest attacks, I guess that should not be the case. Could anyone please have a look at the test results?

I've used trojanzoo version 1.0.8 (with some small modifications and bug fixes of my own).
The experiment was run in a loop like this:

for ...
  run badnet
  run neural cleanse
  run deep inspect
end

About the attached files:
After each run of defense, there are a few lines that read "soft/hard median".
They show a list of outlier indexes of the results (that I think should be the detected poisoned class).
The "soft median" line refers to the outliers computed by considering median by averaging the two middle elements when there are an even number of data points (same as what NumPy does). The hard median refers to the median computed by taking floor(n/2) (the PyTorch way)
defense_di_attack_badnet_cifar_multirun2.txt
defense_di_attack_badnet_mnist_multirun2.txt
defense_nc_attack_badnet_cifar_multirun2.txt
defense_nc_attack_badnet_mnist_multirun2.txt
attack_badnet_cifar_multirun2.txt
attack_badnet_mnist_multirun2.txt
.

Ren Pang · Answer 1 · Mon May 23 2022 23:56:25 GMT+0800 (China Standard Time)

Your mark alpha value is set to 0.0. Which is a total transparent watermark. It is expected to not work.

I remember there is previously a change (around 3 months ago) about alpha value to keep consistent with RGBA definition and support RGBA watermark image import. It’s no longer the transparency but opacity.

Ren Pang · Answer 2 · Mon May 23 2022 23:57:04 GMT+0800 (China Standard Time)

The default value of mark alpha in the code is 1.0

Ren Pang · Answer 3 · Tue May 24 2022 00:09:45 GMT+0800 (China Standard Time)

But it’s interesting that you attack successfully with a total transparent watermark. I don’t think it’s correct. Could you share the command you run BadNet? I’ll check if there is a bug in my validation method.

The clean acc and attack success rate shouldn’t reach over 90% at the same time under a total transparent watermark setting.

Mehrin Saremi · Answer 4 · Tue May 24 2022 00:55:30 GMT+0800 (China Standard Time)

Thank you for your response. Here's the script:

for i in {0..9}
do

CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_attack.py --verbose 1 --dataset cifar10 --model resnet18_comp --attack badnet --device cuda --epoch 200 --save --mark_path square_white.png  --mark_height 3  --mark_width 3  --height_offset 2  --width_offset 2  --batch_size 100 --pretrain >> attack_badnet_cifar_multirun2.txt

 CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset cifar10 --model resnet18_comp --attack badnet --defense deep_inspect --random_init --device cuda --save >> defense_di_attack_badnet_cifar_multirun2.txt

 CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset cifar10 --model resnet18_comp --attack badnet --defense neural_cleanse --random_init --device cuda --save >> defense_nc_attack_badnet_cifar_multirun2.txt

CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_attack.py --verbose 1 --dataset mnist --model net --attack badnet --device cuda --epoch 200 --save --mark_path square_white.png  --mark_height 3  --mark_width 3  --height_offset 2  --width_offset 2  --batch_size 100 >> attack_badnet_mnist_multirun2.txt 

  CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset mnist --model net --attack badnet --defense deep_inspect --random_init --device cuda --save >> defense_di_attack_badnet_mnist_multirun2.txt

 CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset mnist --model net --attack badnet --defense neural_cleanse --random_init --device cuda --save >> defense_nc_attack_badnet_mnist_multirun2.txt

done

Ren Pang · Answer 5 · Tue May 24 2022 01:23:09 GMT+0800 (China Standard Time)

Which version of codes are you currently using? The last release version?
If you are using current codes in github, it should be python 3.10 only.
There shouldn't be any performance difference though.

Mehrin Saremi · Answer 6 · Tue May 24 2022 02:00:51 GMT+0800 (China Standard Time)

I'm using 1.0.8

Ren Pang · Answer 7 · Tue May 24 2022 05:55:46 GMT+0800 (China Standard Time)

Jesus, that's quite an old version. I'll check the results based on the most up-to-date codes.

Ren Pang · Answer 8 · Tue May 24 2022 22:51:51 GMT+0800 (China Standard Time)

In 1.0.8, where alpha value still stands for the opacity. The attack succeeds with a total opaque watermark. This is expected.

Based on your provided neural cleanse results. I see it’s working well. The MAD score of the target class is significantly larger than other classes.

Why do you say it’s not working?

Mehrin Saremi · Answer 9 · Wed May 25 2022 01:51:00 GMT+0800 (China Standard Time)

Did you check that of MNIST?

Ren Pang · Answer 10 · Wed May 25 2022 02:09:20 GMT+0800 (China Standard Time)

It seems to be working as well for Neural Cleanse?

The target class has a significantly smaller Mask Norm and Loss.
Maybe the MAD score is not an outlier, that's because there are some classes with very large loss and mask norm.
We should only consider the small outliers rather than the large outliers.

But I have to say that MAD is not always exceeding 2.0, as the original paper claims. But it works from my personal view and the target class is an outlier based on my observation. Maybe you can try some other metrics rather than the MAD.

Mehrin Saremi · Answer 11 · Wed May 25 2022 03:38:47 GMT+0800 (China Standard Time)

Consider this case please:

mask norms: tensor([30.3323, 50.6468, 36.4013, 32.4547, 55.6689, 44.6483, 50.8236, 51.0138,
48.9748, 56.3015], device='cuda:0')
mask MAD: tensor([2.9064, 0.2607, 1.9602, 2.5755, 1.0436, 0.6745, 0.2882, 0.3179, 0.0000,
1.1422], device='cuda:0')
loss: tensor([0.0023, 0.0041, 0.0025, 0.0025, 0.0034, 0.0030, 0.0026, 0.0043, 0.0028,
0.0035])
loss MAD: tensor([1.0691, 2.4688, 0.6407, 0.6745, 0.9954, 0.2922, 0.5311, 2.7733, 0.0000,
1.2038])
(defense_nc_attack_badnet_mnist_multirun2.txt, line 1120)

It doesn't look much like an outlier, right?

Ren Pang · Answer 12 · Wed May 25 2022 05:03:50 GMT+0800 (China Standard Time)

Yeah, you are correct, it doesn't look like an outlier for this run.

But I also observe it works for many runs as well, and especially almost every run for CIFAR10. So I think it's fine.

If you seek for the performance improvement, I would say this is just an re-implementation of NeuralCleanse and I'll only try my best to keep it consistent with the original paper. While certainly, You are welcome to inherit trojanzoo codes and modify them to improve performance.

Sorry that I didn't take MNIST into consideration in TrojanZoo paper because it's too naive and doesn't show any typical trend compared with more complex dataset such as CIFAR and ImageNet.

Mehrin Saremi · Answer 13 · Wed May 25 2022 19:09:51 GMT+0800 (China Standard Time)

Thanks a lot. But it seems that deep inspect (di) doesn't work well either.

Ren Pang · Answer 14 · Thu May 26 2022 10:19:16 GMT+0800 (China Standard Time)

Thanks a lot. But it seems that deep inspect (di) doesn't work well either.

I think to be honest, their original method doesn't work very well compared with Neural Cleanse, which is pointed out in our paper.