xyupeng / ContrastiveCrop

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some questions about the running process of contrastive cropping

evilemogod opened this issue · comments

Hi author, comparative cropping is a very good idea. We used the last layer feature map or attention map in a similar project in moco v3 to make cropped rectangular borders according to a threshold of 0.1, but both had poor results. We updated the bounding box at 20 intervals and printed the information of the 300 epoch rectangular border to find h_min=0,w_min=0,h_max=0.9955,w_max=0.9955. This indicates that all pixel points in the produced heat map are larger than the threshold and therefore do not play the effect of comparative cropping. Please, are we supposed to enlarge the threshold? In addition, we observe that only the left and lower boundaries of the rectangular borders are related to the heat map, but the upper and right borders are uniformly sampled and computed according to the given scale and ratio. Then, if only the left and bottom borders of the rectangular border are considered, and the right and top borders are randomly selected can we guarantee that the main objects in the image are in the bounding box. This is also the point that makes us wonder. We hope you can give us an answer to the above question, thank you very much.

Hi @evilemogod,
thanks for your questions. If you find the thresh=0.1 is small, it is worth trying larger values like 0.3 or 0.5. According to our experience, 0.5 is big enough because >0.5 will lead to a very small operable region and little variance.
For the second question, it's equivalent to think in this way: we restrict the center of the crop in the box and use the beta distribution for sampling in the box area. We leave the height and width of the crop uniformly sampled. This will make the crop contain both the object and some relevant background info.

Thank you for your suggestions and responses. I see now.