SsisyphusTao / Object-Detection-Knowledge-Distillation

An Object Detection Knowledge Distillation framework powered by pytorch, now having SSD and yolov5.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The given teacher checkpoint (vgg) doesn't fit with the current version of VGG in the repo.

TheLostIn opened this issue · comments

size mismatch for loc.0.weight: copying a param with shape torch.Size([16, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 1024, 3, 3]).
        size mismatch for loc.1.weight: copying a param with shape torch.Size([24, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 512, 3, 3]).
        size mismatch for loc.2.weight: copying a param with shape torch.Size([24, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 256, 3, 3]).
        size mismatch for loc.4.weight: copying a param with shape torch.Size([16, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 256, 3, 3]).
        size mismatch for loc.4.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([24]).
        size mismatch for loc.5.weight: copying a param with shape torch.Size([16, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([24, 256, 3, 3]).
        size mismatch for loc.5.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([24]).
        size mismatch for conf.0.weight: copying a param with shape torch.Size([84, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([84, 1024, 3, 3]).
        size mismatch for conf.1.weight: copying a param with shape torch.Size([126, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([126, 512, 3, 3]).
        size mismatch for conf.2.weight: copying a param with shape torch.Size([126, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([126, 256, 3, 3]).
        size mismatch for conf.4.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([126, 256, 3, 3]).
        size mismatch for conf.4.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([126]).
        size mismatch for conf.5.weight: copying a param with shape torch.Size([84, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([126, 256, 3, 3]).
        size mismatch for conf.5.bias: copying a param with shape torch.Size([84]) from checkpoint, the shape in current model is torch.Size([126]).

yep.. the provided model is the original model which has 8732 anchors.., you need to finetune first and get a vgg model with 3000 anchors then start distillation.
View my blog https://zhuanlan.zhihu.com/p/260370225 to check more details.

Please, for how many epochs we should fine tine the vgg model before running the distillation?

Thank you