global layer normalization
fjiang9 opened this issue · comments
Hi, Efthymios
Thanks for sharing the code!
Is this code
sudo_rm_rf/sudo_rm_rf/dnn/models/sudormrf.py
Line 116 in 5bf8c48
the "global layer normalization" mentioned in the paper?
I think this implementation makes no difference with nn.GroupNorm(1, nOut, eps=1e-08). Here is the test code:
import torch
import torch.nn as nn
x = torch.rand(2, 4, 3)
ln = nn.GroupNorm(1, 4, eps=1e-8)
print(ln(x))
print((x-x.mean(dim=[1, 2], keepdims=True))/(x.var(dim=[1, 2], keepdims=True, unbiased=False)+1e-8).sqrt()) # This is equivalent to your implementation except for the affine operation