rstrudel / segmenter

[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Will the pretrained models work with a different image resolution?

jmgrn61 opened this issue · comments

Thanks for this great work.
I tried the training code with my own data (i.e. image with resolution 256 x 320) from scratch and it worked well. However, it crashed if I loaded the pretrained model file originally training with 500x500 images.
Is it normal for such transformer based networks? I ask this because I know a pretrained CNN based segmentation network (without full connected layer) would not be affected by the input resolution, but I am not very familiar with transfemers.

Hi @jmgrn61 ,
You can check how vision transformer works by looking at the original paper. The model is dependent on the image input size because of the position tokens. These are typically interpolated at test time so it is possible to infer segmentation masks on images of varying size. It is possible that your GPU does not have enough memory to handle 500x500 images hence the crash.