ain-soph / trojanzoo

TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classification in deep learning.

Home Page:https://ain-soph.github.io/trojanzoo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Poor performance of defenses on badnet

programehr opened this issue · comments

Hi,
I've run the "badnet" attack and the "neural cleanse" and "deep inspect" defenses on it several times. The results show that the defenses are not doing a very good job. Given that badnet is one of the simplest attacks, I guess that should not be the case. Could anyone please have a look at the test results?

I've used trojanzoo version 1.0.8 (with some small modifications and bug fixes of my own).
The experiment was run in a loop like this:

for ...
  run badnet
  run neural cleanse
  run deep inspect
end

About the attached files:
After each run of defense, there are a few lines that read "soft/hard median".
They show a list of outlier indexes of the results (that I think should be the detected poisoned class).
The "soft median" line refers to the outliers computed by considering median by averaging the two middle elements when there are an even number of data points (same as what NumPy does). The hard median refers to the median computed by taking floor(n/2) (the PyTorch way)
defense_di_attack_badnet_cifar_multirun2.txt
defense_di_attack_badnet_mnist_multirun2.txt
defense_nc_attack_badnet_cifar_multirun2.txt
defense_nc_attack_badnet_mnist_multirun2.txt
attack_badnet_cifar_multirun2.txt
attack_badnet_mnist_multirun2.txt
.

Your mark alpha value is set to 0.0. Which is a total transparent watermark. It is expected to not work.

I remember there is previously a change (around 3 months ago) about alpha value to keep consistent with RGBA definition and support RGBA watermark image import. It’s no longer the transparency but opacity.

The default value of mark alpha in the code is 1.0

But it’s interesting that you attack successfully with a total transparent watermark. I don’t think it’s correct. Could you share the command you run BadNet? I’ll check if there is a bug in my validation method.

The clean acc and attack success rate shouldn’t reach over 90% at the same time under a total transparent watermark setting.

Thank you for your response. Here's the script:

for i in {0..9}
do

CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_attack.py --verbose 1 --dataset cifar10 --model resnet18_comp --attack badnet --device cuda --epoch 200 --save --mark_path square_white.png  --mark_height 3  --mark_width 3  --height_offset 2  --width_offset 2  --batch_size 100 --pretrain >> attack_badnet_cifar_multirun2.txt

 CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset cifar10 --model resnet18_comp --attack badnet --defense deep_inspect --random_init --device cuda --save >> defense_di_attack_badnet_cifar_multirun2.txt

 CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset cifar10 --model resnet18_comp --attack badnet --defense neural_cleanse --random_init --device cuda --save >> defense_nc_attack_badnet_cifar_multirun2.txt

CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_attack.py --verbose 1 --dataset mnist --model net --attack badnet --device cuda --epoch 200 --save --mark_path square_white.png  --mark_height 3  --mark_width 3  --height_offset 2  --width_offset 2  --batch_size 100 >> attack_badnet_mnist_multirun2.txt 

  CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset mnist --model net --attack badnet --defense deep_inspect --random_init --device cuda --save >> defense_di_attack_badnet_mnist_multirun2.txt

 CUDA_VISIBLE_DEVICES=1 python3.9 ./examples/backdoor_defense.py --verbose 1 --validate_interval 1 --dataset mnist --model net --attack badnet --defense neural_cleanse --random_init --device cuda --save >> defense_nc_attack_badnet_mnist_multirun2.txt

done

Which version of codes are you currently using? The last release version?
If you are using current codes in github, it should be python 3.10 only.
There shouldn't be any performance difference though.

I'm using 1.0.8

Jesus, that's quite an old version. I'll check the results based on the most up-to-date codes.

In 1.0.8, where alpha value still stands for the opacity. The attack succeeds with a total opaque watermark. This is expected.

Based on your provided neural cleanse results. I see it’s working well. The MAD score of the target class is significantly larger than other classes.

Why do you say it’s not working?

Did you check that of MNIST?

It seems to be working as well for Neural Cleanse?

The target class has a significantly smaller Mask Norm and Loss.
Maybe the MAD score is not an outlier, that's because there are some classes with very large loss and mask norm.
We should only consider the small outliers rather than the large outliers.

But I have to say that MAD is not always exceeding 2.0, as the original paper claims. But it works from my personal view and the target class is an outlier based on my observation. Maybe you can try some other metrics rather than the MAD.

Consider this case please:

mask norms: tensor([30.3323, 50.6468, 36.4013, 32.4547, 55.6689, 44.6483, 50.8236, 51.0138,
48.9748, 56.3015], device='cuda:0')
mask MAD: tensor([2.9064, 0.2607, 1.9602, 2.5755, 1.0436, 0.6745, 0.2882, 0.3179, 0.0000,
1.1422], device='cuda:0')
loss: tensor([0.0023, 0.0041, 0.0025, 0.0025, 0.0034, 0.0030, 0.0026, 0.0043, 0.0028,
0.0035])
loss MAD: tensor([1.0691, 2.4688, 0.6407, 0.6745, 0.9954, 0.2922, 0.5311, 2.7733, 0.0000,
1.2038])
(defense_nc_attack_badnet_mnist_multirun2.txt, line 1120)

It doesn't look much like an outlier, right?

Yeah, you are correct, it doesn't look like an outlier for this run.

But I also observe it works for many runs as well, and especially almost every run for CIFAR10. So I think it's fine.

If you seek for the performance improvement, I would say this is just an re-implementation of NeuralCleanse and I'll only try my best to keep it consistent with the original paper. While certainly, You are welcome to inherit trojanzoo codes and modify them to improve performance.

Sorry that I didn't take MNIST into consideration in TrojanZoo paper because it's too naive and doesn't show any typical trend compared with more complex dataset such as CIFAR and ImageNet.

Thanks a lot. But it seems that deep inspect (di) doesn't work well either.

Thanks a lot. But it seems that deep inspect (di) doesn't work well either.

I think to be honest, their original method doesn't work very well compared with Neural Cleanse, which is pointed out in our paper.