P2333 / Bag-of-Tricks-for-AT

Empirical tricks for training robust models (ICLR 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gradient accumulated!

liuxingbin opened this issue · comments

https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L186

Does the model's gradient accumulats when adversarial examples is computed?

The grad on delta will be set zero after each iter (https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L206);

The grad on model parameters will be set zero before backward through training loss and taking opt step (https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L774)

So the gradient of model weight is accumulated during the generate adversarial example, since it has 10 steps and the computing of delta involes the model weights' gradient.
Do you think it is a bug?
https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L104

This part of code is actually cloned from https://github.com/locuslab/robust_overfitting/blob/master/train_cifar.py#L102.

Accumulated gradients on model weights should not affect computing gradients on delta (but yes, it is a good practice to zero out the weight's gradient in each step). You can simply add a zero-out operation after each generation step, and the results should be unchanged.

Thanks for your reply.