juntang-zhuang / Adabelief-Optimizer

Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Changing init learning rate

Kraut-Inferences opened this issue · comments

Does modifying the initial learning rate hurt the algorithm in any way? Wanting to use exponential decay but don't know if it would improve the performance.

From my experience with a ViT model on ImageNet, AdaBelief improves over Adam when both use a default cosine learning rate. I think it should work with other models.

thank you.