vballoli / nfnets-pytorch

NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch. Find explanation at tourdeml.github.io/blog/

Home Page:https://nfnets-pytorch.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is Weight Standardization correct?

gellston opened this issue · comments

Hi

First of all, thank you for sharing this valuable source code.

I'm looking at the code you implemented.
I ask because the implementation of weight standradization is different from the original.

Original paper and github
https://paperswithcode.com/method/weight-standardization
https://paperswithcode.com/method/weight-standardization

image

Your implementation
image

Did you misunderstand the formula at the time of implementation?
image

Can you confirm that I have misunderstood or are mathematically the same formula?

Thanks

Sorry for confusing you. I checked again and it's the same.