NoelShin / reco

[NeurIPS'22] ReCo: Retrieve and Co-segment for Zero-shot Transfer

Home Page:https://www.robots.ox.ac.uk/~vgg/research/reco/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem of evaluating ReCo on CityScapes

xs1317 opened this issue · comments

Hello, thanks for sharing your fantastic work.

I evaluated the ReCo on Cityscapes validation split but didn't get the same result as your paper.Here are my result and configs:

Results: deit_s_16_sin_in_train_ce_ta_dc/results_crf.json

"Pixel Acc": 0.7456942998704943, "Mean IoU": 0.19347486644280978
I get better Pixel Acc and worse mIOU.

configs:

        imagenet_split: "train"  
        dataset_name: "cityscapes"  
        clip_arch: "ViT-L/14@336px" 
        dense_clip_arch: "RN50x16"  
        dense_clip_inference: true
        encoder_arch: "DeiT-S/16-SIN"  
        patch_size: 16
        context_categories: ["tree", "sky", "building", "road", "person"]
        context_elimination: true
        text_attention: true 

I directly download the reference image embedding for CityScapes and the pre-trained models.
The problem should not be caused by image preprocessing because I got the same result when evaluating ReCo+ on CityScapes.
Would you help me solve this problem? Thanks!

Hi @xs1317 and thank you for your kind words and interest in our work.

I'm having difficulties understanding your question as it seems you got the same results as the ones reported in the paper after rounding, i.e., 74.6 and 19.3 for pixel accuracy and mIoU (in percent).

Could you please double check if your question is correct or if by any chacne you got the numbers confused with another dataset such as KITTI-STEP?

Kind regards,
Noel

Sorry, I was too quick to reply to you earlier - I noticed that the numbers you compared to are the ones in the revised version rather than the arXiv version. The difference is: in the new version, ReCo was evaluated at original resolutions using the inference code of Drive&Segment (https://github.com/vobecant/DriveAndSegment) for a fair comparison, whereas ReCo+ is evaluated at 320×320 pixels as in STEGO (https://arxiv.org/abs/2203.08414). As the image pre-processing used in this repo is for the latter case, there are some differences as you noted.

Noel

Hi @NoelShin ,thanks for your reply.
I read the arXiv version and get the same result on CityScapes.There's another problem when I evaluated ReCo on KITTI_STEP validation split. The result is 72.0 and 31.0 for pixel accuracy and mIoU (in percent) which is better than the ones in the arXiv version. I can't find any result matching this result.
The accurate results is "Pixel Acc": 0.7199210254853453,"Mean IoU": 0.30981611409230164.
The configs are the same as my origin issue.And the result of ReCo+ on KITTI_STEP matches the result in the arXiv version.

Thank you very much for letting me know - I could confirm that the embedding file produces the numbers you got. To double check the code, I re-implemented to compute the image embeddings for KITTI-STEP and, in that case, the correct numbers could be obtained as reported in our paper. (I think I mistakenly upload a different embedding file with some other setting that I got during ablation studies.) Please try to download the embedding file again and verify if you could get the exact numbers. If not, please let me know again. :)

Download link

Hi @NoelShin ,thanks for your help.
I download the embedding and the result is correct as your paper. I wonder the setting for original embedding.If it is convenient, please tell me.