LiyuanLucasLiu / Transformer-Clinic

Understanding the Difficulty of Training Transformers

Home Page:https://arxiv.org/abs/2004.08249

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

whta's the meaning of 'adaptive-scale' argument?

gotobelieve opened this issue · comments

hi, I have read your original paper, and get trouble when I want to replicate the experiment, could you please explain the meaning of 'adaptive-scale' argument in options.py?

It's a debug argument to verify that our initialization can control the output variance to what we want. This parameter is not used in our experiments. It will be deleted later.