taki0112 / RAdam-Tensorflow

In the algorithm outlined in the original paper, the threshold for whether adapted momentum is applied or not is set to ρt > 4, however looking at the code the threshold used is 5.0

RAdam-Tensorflow/RAdam.py

Line 99 in 29328c3

var_t = tf.cond(sma_t >= 5.0, lambda : r_t * mhat_t / vhat_t, lambda : mhat_t)

Is there any reason for this?

Difference in SMA threshold between code and paper