Questions about the function "resize_pos_embed" in load model weights when input different resolution image.

Question

Questions about the function "resize_pos_embed" in load model weights when input different resolution image.

SUNJIMENG opened this issue 2 years ago · comments

Thank you for your interesting work!

I have a question, when the input resolution is different from the pretrained vit model, your solution is "we bilinearly interpolate the pre-trained position embeddings according to their original position in the image to match the fine-tuning sequence length" in your paper.
While I see the "resize_pos_embed" function in timm.vision_transformer._load_weights() is to choose "bicubic" interpolation instead of "bilinear", so have you verified that "bilinear interpolation" is better than "bicubic interpolation"?

rstrudel · Answer 1 · Tue Dec 21 2021 02:02:46 GMT+0800 (China Standard Time)

Hi @SUNJIMENG . I have checked at some point and did not observe significative difference between using bilinear or bicubic interpolation.