OverflowError: (34, 'Numerical result out of range')

Question

OverflowError: (34, 'Numerical result out of range')

saurabh-kataria opened this issue 4 years ago · comments

I think DEMON based optimizers have some issue. Line 375 momentum.div_(1 - (beta1 ** state['step'])).mul_(nu1).add_(1-nu1, grad) gives the following error after few iterations of training.
OverflowError: (34, 'Numerical result out of range')
It seems to be a general issue:
https://discuss.pytorch.org/t/overflowerror-34-numerical-result-out-of-range/1907
I tried their recommendation and it still didn't work. My value of beta1 was around -3 when this overflow error was raised. I believe beta1 should stay in [0,1] ?
This problem occurs in other optimizers also. Removing DEMON option seems to get rid of this issue.

Jishnu Ray Chowdhury · Answer 1 · Fri Sep 04 2020 23:22:42 GMT+0800 (China Standard Time)

Yes, beta should be [0,1]. I think this may be because step becomes > self.T where self.T = self.T = self.epochs*self.step_per_epoch.
That is to properly use DEMON, you have enter the step_per_epoch and the maximum epochs during initializing the constructor. Are you sure that state["step"] never becomes greater thatn epochs*step_per_epoch in your case. There could be still some other issue. Let me know if you find something related to this otherwise I will check more deeply.

Saurabh · Answer 2 · Sat Sep 05 2020 00:45:05 GMT+0800 (China Standard Time)

I think you are correct. DEMON versions require correct knowledge of self.epochs and self.step_per_epoch. I was specifying number of epochs incorrectly. Now, it is fine. Thanks for your response.