xyupeng / ContrastiveCrop

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some questions for the papers

Khoa-NT opened this issue · comments

Thank you for an interesting paper and easy to understand.
Can I ask some questions?

1/ I still don't understand what is the class score you mentioned in section 3.4.
Can you explain more?
I checked in the code but I couldn't find it. Please correct me if I missed it.

2/ It's interesting that the learning rate for training the linear classifier is 10.
Do you have any findings on this? or it's a heuristic configuration?

3/ What is the red plot in Section 4.4. Ablation Studies / Semantic-aware Localization

We also make comparison with RandomCrop that does not use localization (i.e., k = 0), and ground
truth bounding boxes (the red plot).

Is it another experiment but has been removed in Fig 6.a ?

Thank you

Hi, Khoa-NT
Thank you for your interest and your questions.

  1. Sorry for the confusion. By class score we mean the class probability after softmax (a real number within (0, 1)). We get the class score by inputting a crop to a standard ResNet50 trained with ImageNet labels. We didn't put it in the code since it is not the main experiment.
  2. The linear classifier learning rate is adapted from MoCo. The linear cls lr is 30.0 in MoCo. We did a little parameter tuning to make it suitable for all models on small datasets.
  3. It's a mistake that we did not remove the latter half sentence. Please ignore that.

Hi @xyupeng,
Thank you for your details and congratulation on the Oral paper.

In 1)

Sorry for the confusion. By class score we mean the class probability after softmax (a real number within (0, 1)). We get the class score by inputting a crop to a standard ResNet50 trained with ImageNet labels. We didn't put it in the code since it is not the main experiment.

If I understand correctly, the class score is the argmax class probability of the prediction (after softmax).
Did you check the predicted class, which is corresponding with that class probability, is the same as the GT?
I just wonder, if the predicted class was wrong, then maybe the semantic information is not useful.

The class score is the probability at the index of the gt class of that crop/image. It's not the argmax index. We use this score as an indicator of how much categorized semantic information the input crop contains.

Thank you for clarifying. I got it.