Error of the size mismatch for pos_embed

Question

Error of the size mismatch for pos_embed

lxn96 opened this issue 3 years ago · comments

We load our pretrained model of vit-base trained with mae method, and we meet the size mismatch for pos_embed. Is there any solution to this problem please?

RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 785, 768]) from checkpoint, the shape in current model is torch.Size([1, 578, 768]).

Yuxin Fang (方羽新) · Answer 1 · Wed Mar 09 2022 11:15:00 GMT+0800 (China Standard Time)

Hello! To my knowledge, MAE uses 2D sin-cos pos embed while YOLOS uses 1D abs learnable pos embed.
I suggest changing the original YOLOS pos embed to MAE's.