Optimizer zero grad performed after ema update
NanoCode012 opened this issue · comments
Lines 313 to 316 in 96fa40a
Hello, may I understand why you perform optimizer.zero_grad()
after ema.update(model)
?
Also, is there a reason to call .cuda.synchronize()
in optimize? I read that DDP will perform synchronize when it's needed.
These actually doesn't make sense.
Just add them to keep the code similar to others.
You can saftely keep the original way.