gradient accumulated！

Question

gradient accumulated！

liuxingbin opened this issue 2 years ago · comments

https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L186

Does the model's gradient accumulats when adversarial examples is computed?

Tianyu Pang · Answer 1 · Tue Nov 08 2022 00:52:42 GMT+0800 (China Standard Time)

The grad on delta will be set zero after each iter (https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L206);

The grad on model parameters will be set zero before backward through training loss and taking opt step (https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L774)

Xingbin Liu · Answer 2 · Tue Nov 08 2022 10:06:25 GMT+0800 (China Standard Time)

So the gradient of model weight is accumulated during the generate adversarial example, since it has 10 steps and the computing of delta involes the model weights' gradient.
Do you think it is a bug?
https://github.com/P2333/Bag-of-Tricks-for-AT/blob/master/train_cifar.py#L104

Tianyu Pang · Answer 3 · Thu Nov 10 2022 11:28:42 GMT+0800 (China Standard Time)

This part of code is actually cloned from https://github.com/locuslab/robust_overfitting/blob/master/train_cifar.py#L102.

Accumulated gradients on model weights should not affect computing gradients on delta (but yes, it is a good practice to zero out the weight's gradient in each step). You can simply add a zero-out operation after each generation step, and the results should be unchanged.

Xingbin Liu · Answer 4 · Thu Nov 10 2022 16:07:07 GMT+0800 (China Standard Time)

Thanks for your reply.