rstrudel / segmenter

[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about the function "resize_pos_embed" in load model weights when input different resolution image.

SUNJIMENG opened this issue · comments

commented

Thank you for your interesting work!

I have a question, when the input resolution is different from the pretrained vit model, your solution is "we bilinearly interpolate the pre-trained position embeddings according to their original position in the image to match the fine-tuning sequence length" in your paper.
While I see the "resize_pos_embed" function in timm.vision_transformer._load_weights() is to choose "bicubic" interpolation instead of "bilinear", so have you verified that "bilinear interpolation" is better than "bicubic interpolation"?

Hi @SUNJIMENG . I have checked at some point and did not observe significative difference between using bilinear or bicubic interpolation.