Questions about the function "resize_pos_embed" in load model weights when input different resolution image.
SUNJIMENG opened this issue · comments
Thank you for your interesting work!
I have a question, when the input resolution is different from the pretrained vit model, your solution is "we bilinearly interpolate the pre-trained position embeddings according to their original position in the image to match the fine-tuning sequence length" in your paper.
While I see the "resize_pos_embed" function in timm.vision_transformer._load_weights() is to choose "bicubic" interpolation instead of "bilinear", so have you verified that "bilinear interpolation" is better than "bicubic interpolation"?
Hi @SUNJIMENG . I have checked at some point and did not observe significative difference between using bilinear or bicubic interpolation.