Optimizer and Weight decay
nw89 opened this issue · comments
nw89 commented
Regarding 6.4 in the paper, do you use actual weight decay or a simple L^2 regularisation term on the weights? Is the optimizer ordinary SGD or something like Adam?
Michaël Ramamonjisoa commented
I used SGD optimizer with learning rate decay such that learning_rate = learning_rate * (1 - epoch /max_epoch) ^ 0.9, weight_decay=2e-6 (typically) and momentum=0.9.
The optimizer I used is SGD.
nw89 commented
Brilliant, thank you very much!