joe-siyuan-qiao/WeightStandardization Issues
WS with ConvTranspose2d?
Updatedchanging the position of epsilon
Updated 2It just explodes!!!
Updated 1WD or WD+GN with fp16
Updated 3the loss is nan
Updated 6WS of Deformable Convolution.
Updated 3batch size and iteration
Updated 2In my model, GN+WS is worse
Closed 4about train and inference
Updated 1Quesions on accuracy gain of WS?
Closed 2In my test,gn+ws result is worse
Closed 2Conv1d Version?
Closed 5