timgaripov / swa

Stochastic Weight Averaging in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How about sgdr?

twmht opened this issue · comments

CLR is a special case of sgdr (given restart peroid=1 and restart_mul=1). But how about the performance if I choose restart_period=10 and restart_mul=2? Since you always chose the minimum values of each cycle, it will be only a few snapshot to be averaged in sgdr. For example, in 100 epochs, there is only 3 cycles in sgdr, so only 3 snapshot I can use for swa.

Since CLR is not better than sgdr in my experiment, it's much better if sgdr can work with swa.

Hi, sorry for the late reply. In our experiments a simple constant learning rate worked best, but it would definitely be interesting to see how SWA would work with SGDR.