Doc error in the explanation of --lr-decay-inv-sqrt?
EtienneAb3d opened this issue · comments
Hi all,
In the documentation, I read:
--lr-decay-inv-sqrt: learning rate will be decreased at n / sqrt(no. updates) starting at n-th update
Trying to understand what kind of value I should set for this parameter, I finally conclude it should rather be:
lr-init * sqrt(n / no. updates)
Right?