icoz69 / CaNet

In the paper you note that: "Our final result is achieved by bilinearly upsampling the confidence map to the same spatial size of the query image and classifying each location according to the confidence maps." As far as I can tell from the code, your model is not upsampling (the predictions come from here:

CaNet/one_shot_network.py

Line 308 in fb75e10

out=self.layer9(out)

)

How are you evaluating the results when the predictions and labels are of different shapes? Do you downsize the masks to be the size of the predictions or upsample the predictions to be the size of the masks in other unreleased code?

Also, during training and eval, do you first resize the images and masks before passing them into the graph? Or do you use original pascal image dimensions?

This could relate to difficulties in producing reported results #4

hello, i have updated the training scripts now. the no-learnable upsampling is done outside the forward function in the network. for the final evaluation, all metrics, i.e. meanIOU and FB-IOU are based on the original image size.

Hi @icoz69, thank you for getting back to me. It's cool to see some more of the training details. It looks like you are training and evaluating on input_size = (321, 321), not the original image shapes, which are not all 321 by 321, correct? Also, It looks like you are saving the best model found during training (on the 15 training tasks) by evaluating on the test set of 5 held out tasks here:

CaNet/train.py

Line 269 in fdce946

    
           torch.save(model.cpu().state_dict(), osp.join(checkpoint_dir, 'model', 'best' '.pth'))

Is that correct?

Hi  you are right . This is a quick batched  validition with 321. To get the final result, you should validare with raw size and multi-scale input test. For the cross -validation experment on pascal voc, we record the best val peformance. 

…

---Original--- From: "Sean Hendryx"<notifications@github.com> Date: Mon, Sep 9, 2019 07:22 AM To: "icoz69/CaNet"<CaNet@noreply.github.com>; Cc: "Mention"<mention@noreply.github.com>;"icoz69"<691269335@qq.com>; Subject: Re: [icoz69/CaNet] model does not upsample predictions (#6) Hi @icoz69, thank you for getting back to me. It's cool to see some more of the training details. It looks like you are training and evaluating on input_size = (321, 321), not the original image shapes, which are not all 321 by 321, correct? Also, It looks like you are saving the best model found during training (on the 15 training tasks) by evaluating on the test set of 5 held out tasks here: https://github.com/icoz69/CaNet/blob/fdce9462d03ff93bc11f4795df94fbbf60b1c8ca/train.py#L269 Is that correct? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Thanks for the additional details @icoz69. One more question I had about the dataset that would be helpful for people to know is: how did you combine the examples from SBD and PASCAL to make the PASCAL-5^i dataset? Given that there is some amount of overlap, did you just use SBD? Or did you simply put the images and masks from both PASCAL and SBD into the same data directory?

I ask because if you simply put both datasets into the same directory, there may be some amount of overlap, meaning you will sometimes have the same images in the sampled few-shot training and validation sets.

Or perhaps you choose the image-mask pair if it is in one parent dataset and not the other?

I.e. if the mask for image1.jpg is in both SBD and PASCAL, do you just choose the image-mask pair from SBD?

there is a split of SBD that adopts the same val set with voc 2012. thats the splits we use. you can simply combine all training images in two datasets.

Thanks for the clarification @icoz69. I am closing this issue as it is more clear how to reproduce your work now.

you are welcome. good luck to your project