rstrudel / segmenter

[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance of Seg-B/16 on CityScapes using AugReg initialization

YiF-Zhang opened this issue · comments

commented

Hi, thanks for the excellent work! I notice that in your paper, the Seg-B/16 trained on CityScapes is initialized by DeiT pre-trained model (rather than AugReg). And by my own experiments, Seg-B/16 (and my own model based on ViT-Base) with AugReg initialization performs quite bad on CityScapes (73.2 mIoU), while Seg-S/16 performs well (76.2 mIoU). So I wonder if you guys had also got similar results, and if you can share extra information about your choice on initialization of Seg-B/16 model?
Many thanks.

Hi @YiF-Zhang ,

On cityscapes, when initializing Seg-B/16 backbone with ViT-B AugReg weights, we get a mIoU of 77.46 for single-scale inference and 79.60 for multi-scale inference. The performance are thus better when initialized with DeiT-B weights compared to ViT-B AugReg weights. But so far we always had Seg-B/16 > Seg-S/16 in terms of performance.

For the weights that are trained from scratch we use the following function https://github.com/rstrudel/segmenter/blob/master/segm/model/utils.py#L12-L19 to initialize them. I hope this helps!

I am closing this issue, feel free to reopen it if you have more problems!
Robin

commented

Hi @rstrudel , I have successfully reproduced your result. Thanks for the info!

Still, I have another question about inference speed. How did you measure your FPS? I used script from mmsegmentation and got only ~29 FPS for tiny model on Tesla V100. Can you provide more info about this? Thanks a lot.

Hi @YiF-Zhang , we computed throughput in the paper, e.g. we measure the speed of processing batches of images which is different from FPS. We are currently working on a PR of Segmenter into mmsegmentation (this is ongoing work, I would not recommend to use it for now, it's probably better to stick with this repo). I computed FPS using their script, the results are here:
https://github.com/rstrudel/mmsegmentation/tree/master/configs/segmenter .

Segmenter is among the most competitive models in terms of mIoU/FPS ratio, outperforming Swin-Transformer+UperNet approach by quite some margin for example https://github.com/open-mmlab/mmsegmentation/tree/master/configs/swin .

On Cityscapes, images are 768x768, thus the resolution is quite high and the quadratic cost of attention reduces the speed at which the model processes images compared to ADE20K where the resolution is 512x512.