Optimizer and Weight decay

Question

Optimizer and Weight decay

nw89 opened this issue 5 years ago · comments

Regarding 6.4 in the paper, do you use actual weight decay or a simple L^2 regularisation term on the weights? Is the optimizer ordinary SGD or something like Adam?

Michaël Ramamonjisoa · Answer 1 · Tue Jun 11 2019 02:19:22 GMT+0800 (China Standard Time)

I used SGD optimizer with learning rate decay such that learning_rate = learning_rate * (1 - epoch /max_epoch) ^ 0.9, weight_decay=2e-6 (typically) and momentum=0.9.

The optimizer I used is SGD.

nw89 · Answer 2 · Tue Jun 11 2019 17:47:42 GMT+0800 (China Standard Time)

Brilliant, thank you very much!