Which one is the resi used in experiments?

Question

Which one is the resi used in experiments?

Oceanlib opened this issue 2 years ago · comments

Ocean2022 commented 2 years ago

Hi,
Thanks for your code.
I find that there are two implements of resi, and the one commented out is consistent with function (10) in paper.
So which is actually used in experiments?

# resi = torch.pow(resi*one_over_alpha1, beta1).clamp(min=self.resi_min, max=self.resi_max)
resi = (resi * one_over_alpha1 * beta1).clamp(min=self.resi_min, max=self.resi_max)

uddeshya · Answer 1 · Tue Jul 19 2022 19:44:25 GMT+0800 (China Standard Time)

Hi,
We noticed that in practice, the commented line
resi = torch.pow(resi*one_over_alpha1, beta1).clamp(min=self.resi_min, max=self.resi_max)
is harder to tame during training, you hit NaNs more often, if you do not tune your learning rate properly.
The uncommented line leads to faster convergence.
However, both can be made to work well.

Ocean2022 · Answer 2 · Thu Jul 21 2022 14:03:25 GMT+0800 (China Standard Time)

Thanks for your reply.