LiyuanLucasLiu / Transformer-Clinic

Understanding the Difficulty of Training Transformers

Home Page:https://arxiv.org/abs/2004.08249

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the adaptive optimizer

chenwydj opened this issue · comments

Thanks for this great work!

I failed to find more details about the adaptive optimizer mentioned in the paper. Could you point me any reference or github link about this adaptive optimizer?

Thank you!

Hi thanks for reaching out.
In the paper, we use adaptive optimizer to refer a class of optimizers including Adam, Adamax, RMSProp, RAdam, etc. In our experiments, we are using RAdam as the optimizer.