Simple Tensorflow implementation of "On The Variance Of The Adaptive Learning Rate And Beyond"
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
joeforan76 opened this issue 5 years ago · comments
In the algorithm outlined in the original paper, the threshold for whether adapted momentum is applied or not is set to ρt > 4, however looking at the code the threshold used is 5.0
RAdam-Tensorflow/RAdam.py
Line 99 in 29328c3
Is there any reason for this?