Error of the size mismatch for pos_embed
lxn96 opened this issue · comments
Xiaonan Lu commented
We load our pretrained model of vit-base trained with mae method, and we meet the size mismatch for pos_embed. Is there any solution to this problem please?
RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 785, 768]) from checkpoint, the shape in current model is torch.Size([1, 578, 768]).
Yuxin Fang (方羽新) commented
Hello! To my knowledge, MAE uses 2D sin-cos pos embed while YOLOS uses 1D abs learnable pos embed.
I suggest changing the original YOLOS pos embed to MAE's.