Msra vs Xavier

Question

Msra vs Xavier

ryanjay0 opened this issue 8 years ago · comments

I've noticed the only difference between the default resnet50_1by2 and your implementation (besides the number of classes) is the change from weight_filler mrsa to xavier, and bias filler from constant to xavier in the InnerProduct layer.

Was there a reason for that change? Maybe the small number of classes? Did it make a big difference?

Jay Mahadeokar · Answer 1 · Wed Nov 02 2016 01:41:45 GMT+0800 (China Standard Time)

I am assuming default resnet501by2 is the one mentioned here. The initialization while finetuning does not make much difference while finetuning since only the params of last layer (FC_nsfw) are initialized , and rest are loaded from pretrained model. Effect of initialization while training on imagenet is more significant and you can refer to corresponding papers for more details