taki0112 / RAdam-Tensorflow

Simple Tensorflow implementation of "On The Variance Of The Adaptive Learning Rate And Beyond"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Difference in SMA threshold between code and paper

joeforan76 opened this issue · comments

In the algorithm outlined in the original paper, the threshold for whether adapted momentum is applied or not is set to ρt > 4, however looking at the code the threshold used is 5.0

var_t = tf.cond(sma_t >= 5.0, lambda : r_t * mhat_t / vhat_t, lambda : mhat_t)

Is there any reason for this?