AGC without modifying the optimizer

Question

AGC without modifying the optimizer

kayuksel opened this issue 4 years ago · comments

Kamer Ali Yuksel commented 4 years ago

Hello,

Is there a way to apply AGC externally without modifying the optimizer code?

I am using optimizers from torch_optimizer package and that would be good.

Vaibhav Balloli · Answer 1 · Mon Feb 15 2021 06:36:57 GMT+0800 (China Standard Time)

Yeah I think it should be possible by creating a wrapper for any optimizer. I'll try to add this as soon as possible, but for anyone who's interested in a quick implementation:

class AGC(optim.Optimizer):

    def __init__(self, optim, clipping=1e-2, eps=1e-3):
        super().__init__()
        self.optim = optim
        self.clipping = clipping
        self.eps = eps

    @torch.no_grad()
    def step(self, closure=None):
        for group in self.param_groups:
            for p in group['params']:
                param_norm = torch.max(unitwise_norm(
                    p), torch.tensor(group['eps']).to(p.device))
                grad_norm = unitwise_norm(p.grad)
                max_norm = param_norm * group['clipping']

                trigger = grad_norm > max_norm

                clipped_grad = p.grad * \
                    (max_norm / torch.max(grad_norm,
                                          torch.tensor(1e-6).to(grad_norm.device)))
                p.grad.data.copy_(torch.where(trigger, clipped_grad, p.grad))
        self.optim.step(closure)

Vaibhav Balloli · Answer 2 · Mon Feb 15 2021 06:39:27 GMT+0800 (China Standard Time)

Note that the above code is a raw implementation, but the actual code will be pretty close to this. Hope this helps @kayuksel

Vaibhav Balloli · Answer 3 · Tue Feb 16 2021 01:12:42 GMT+0800 (China Standard Time)

Added the generic AGC here 658675f, but needs testing. Do let me know how it works out for you.

Kamer Ali Yuksel · Answer 4 · Tue Feb 23 2021 17:19:35 GMT+0800 (China Standard Time)

@vballoli I have tested it. It worked without problems for me.