Which weight decay?
SiBensberg opened this issue · comments
Hi,
In your paper at 2) Training details
you used a weight decay of 0.5 but in the readme at the bottom a different weight decay of 0.001 is noted. Which did you use or do I confuse something here?
Hey @SiBensberg,
They are two different types of decays. 0.5 for the decay of the learning rate, which is related to the scheduler. The later is for the decay of weights for the trained model (0.001), which is related to regularization of the model from overfitting.
Hope this clarifies your confusion.
Hi @donkeymouse ,
thank you for your reply. I already guessed that just wanted to be sure.