Msra vs Xavier
ryanjay0 opened this issue · comments
I've noticed the only difference between the default resnet50_1by2 and your implementation (besides the number of classes) is the change from weight_filler mrsa to xavier, and bias filler from constant to xavier in the InnerProduct layer.
Was there a reason for that change? Maybe the small number of classes? Did it make a big difference?
I am assuming default resnet501by2 is the one mentioned here. The initialization while finetuning does not make much difference while finetuning since only the params of last layer (FC_nsfw) are initialized , and rest are loaded from pretrained model. Effect of initialization while training on imagenet is more significant and you can refer to corresponding papers for more details