hustvl / YOLOS

[NeurIPS 2021] You Only Look at One Sequence

Home Page:https://arxiv.org/abs/2106.00666

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error of the size mismatch for pos_embed

lxn96 opened this issue · comments

We load our pretrained model of vit-base trained with mae method, and we meet the size mismatch for pos_embed. Is there any solution to this problem please?

RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 785, 768]) from checkpoint, the shape in current model is torch.Size([1, 578, 768]).

Hello! To my knowledge, MAE uses 2D sin-cos pos embed while YOLOS uses 1D abs learnable pos embed.
I suggest changing the original YOLOS pos embed to MAE's.