How about sgdr?

Question

How about sgdr?

twmht opened this issue 6 years ago · comments

CLR is a special case of sgdr (given restart peroid=1 and restart_mul=1). But how about the performance if I choose restart_period=10 and restart_mul=2? Since you always chose the minimum values of each cycle, it will be only a few snapshot to be averaged in sgdr. For example, in 100 epochs, there is only 3 cycles in sgdr, so only 3 snapshot I can use for swa.

Since CLR is not better than sgdr in my experiment, it's much better if sgdr can work with swa.

Pavel Izmailov · Answer 1 · Wed Aug 07 2019 23:51:40 GMT+0800 (China Standard Time)

Hi, sorry for the late reply. In our experiments a simple constant learning rate worked best, but it would definitely be interesting to see how SWA would work with SGDR.