bboylyg / NAD

This is an implementation demo of the ICLR 2021 paper [Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks](https://openreview.net/pdf?id=9l0K4OM-oXE) in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the loss function is not useful in the experiment?

xiajun112233 opened this issue · comments

Hello, I'm very interested in this paper, when I see the main.py, the three at_loss are used .detach() to out of the calculate graph in the PyTorch. So I delete the at1_loss、at2_loss、at3_loss in the loss function. But, when I run the changed code, the ASP is still very low. I think the at_loss is not useful in the code. The training dataset in the main code is the clean dataset, not the backdoor dataset, so the NAD ASR is very low. However, the training dataset in the train_badnets code uses the backdoor dataset, so the baseline ASR is high. I changed the training dataset in the main code to the backdoor dataset. Unfortunately, the NAD is not useful in the backdoor dataset.
image

Hi, Thanks for your interest in our work. To verify the effectiveness of NAD, you could finetune the backdoored student with/without the NAD loss, i.e. setting at1_loss, at2_loss, and at3_loss all to be non-zero/zero, and compare the ASR under two types of settings.

without at_loss
image
image

with at_loss
image
image

Thanks for providing the screenshot. It is clear to see that there achieves a better erasing result with NAD loss(ASR decreases to 3.78%, compared to the result without NAD loss). By the way, the selection of trigger types\teacher models\data augmentation techniques also causes different erasing effects for distillation.

But, when I run without NAD loss train code, there also have good results in ASR, so I think it is random results for the CE loss in the clean dataset, you can see the next pictures. Whether use the clean dataset to retrain the backdoor model is good enough to defend against the backdoor attack? Thank you.
image

image

To be honest, It is not surprising that Fine-tuning can effectively erase BadNets attack; the erasing effect is probably attributed to the data augmentation techniques, i.e. Padding, flip, and cutout, as they are highly related to the original trigger pattern. You can change the param of Cutout as 1 hole with a litter size 9 or 4 to verify this observation. By the way, I think the adaptive attacks shown in Appendix K(Table 9) in our paper will be beneficial to your understanding of our NAD.

OK, thank you, which parameters in the code should I change to use the adaptive attacks in this code?

The most simple case is that changing the location of the backdoor trigger (i.e. BadNets trigger) from the bottom-right to the center of the image.