[Question] kaiming initialization when pretrained=False

Question

[Question] kaiming initialization when pretrained=False

austinmw opened this issue 4 years ago · comments

Hi, thanks for all these tutorials! In your notebook 05_EfficientNet, at the very bottom, I noticed it looks like when you set pretrained=False, you only initialize the head of the model? Am I interpreting correctly how this initialization is applied? And if that's the case, is that the correct way to do it, or should I change model[1] to model? Thanks!

Zach Mueller · Answer 1 · Thu Mar 12 2020 12:23:04 GMT+0800 (China Standard Time)

Yes, but we also are using pretrained weights there so it doesn’t matter in the long run (notice we load old weights in), as we don’t train with the uninitialized body, we instead use the body from our other model

Austin Welch · Answer 2 · Thu Mar 12 2020 12:35:33 GMT+0800 (China Standard Time)

I'm confused about the very last code cell in the notebook, but maybe I'm just overtired:

body = create_timm_body('efficientnet_b3a', pretrained=False)
head = create_head(3072, dls.c)
model = nn.Sequential(body, head)
apply_init(model[1], nn.init.kaiming_normal_)
learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(), 
                splitter=default_split, metrics=accuracy)
learn.freeze()
learn.fit_one_cycle(5, 3e-3)

I would think here since the net is being loaded with pretrained=False, that you would use apply_init(model, nn.init.kaiming_normal_) and not freeze the network. I could be missing something though, just trying to check my understanding.

Zach Mueller · Answer 3 · Thu Mar 12 2020 12:38:58 GMT+0800 (China Standard Time)

Aha! Totally my fault, my bad :) yes you are right. We probably should be initializing the whole thing there, not just the head. (Along with not freezing) I can try to get to it here in the next few days, but a PR would be more than welcome 😊

Austin Welch · Answer 4 · Thu Mar 12 2020 12:44:33 GMT+0800 (China Standard Time)

No problem, it took me a while to realize while playing with a very non-imagenet-like dataset :) Just glad I was understanding correctly! Will try to make a PR tomorrow.