lucidrains / lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Did you increase the decoupled weight decay simultaneously when decreasing the learning rate?

xiangning-chen opened this issue · comments

Thanks for implementing and testing our lion optimizer!
Just wondering did you also enlarge the decoupled weight decay to maintain the regularization strength?

best,
--xiangning

@xiangning-chen Hi Xiangning! Thank you for this interesting paper

So far I have been only testing with weight decay turned off. There are a lot of networks that are still trained with just plain Adam, and I wanted to see how Lion fares against Adam alone

@xiangning-chen but yes, I have noted the section in the paper where you said the weight decay needs to be higher

Let me add that to the readme to increase the chances people train it correctly

Thanks for the update!
Yeah disabling weight decay for both optimizers is pretty meaningful and fair, thank you!

@xiangning-chen ok good luck! hope this technique holds up to scrutiny!