karpathy / makemore

An autoregressive character-level language model for making more things

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LayerNorm eps value

guglielmocamporese opened this issue · comments

Hi!

thanks for this little piece of juicy code!

Just for curiosity, I've noticed that in your implementation you are using nn.LayerNorm with the standard denominator constant eps=1e-5, whereas in other implementations (DINO [here] and ViT in timm[here]) this parameter is explicitly set to eps=1e-6.

I know that it is a small detail, but details sometimes are super-important for having better models.

Do you think the model is sensitive to this kind of parameter change? Have you ever tried/noticed it?

Thanks!