WXinlong / DenseCL

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021 Oral.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to gain dense correspondence from the known data augmentation?

lilanxiao opened this issue · comments

Hi, Thank you very much for the nice work!

I have a question about the dense correspondence of views. In the paper, the correspondence is gained by calculating the similarity between feature vectors from the backbone. Since the data augmentation (e.g. rotating, cropping, flipping) performed to each view of the same image is known, it's possible to obtain the correspondence directly from these transformations.

For example, Image A is a left-right flipped copy of Image B. The two images are encoded to 3x3 feature maps, which can be represented as:

fa1, fa2, fa3
fa4, fa5, fa6
fa7, fa8, fa9

and

fb1, fb2, fb3
fb4, fb5, fb6
fb7, fb8, fb9

Since A and B are flipped views of the same image, the correspondence could be (fa1, fb3), (fa2, fb2), (fa3, fb1), ... .

From my perspective, the transformation-motivated correspondence is more straightforward but the paper doesn't use it. Are there any intuitions behind this?

Thank you again!

Hi, I notice that there is already a paper that is similar to your idea (https://arxiv.org/pdf/2011.10043.pdf). Please correct me if I misunderstand what you mean.

Yes, using geometric transformation is a straightforward way. In our framework, the two ways can achieve almost the same results. This part of the experiments will be updated in our next version.

As discussed in our paper, our proposed method is more flexible and simple.
Please refer to our paper for a detailed discussion. (the last of Sec 1.1 Related Work/Pre-training for dense prediction tasks)

Hi, I notice that there is already a paper that is similar to your idea (https://arxiv.org/pdf/2011.10043.pdf). Please correct me if I misunderstand what you mean.

Hi, thank you for the information! I haven't read that paper before and it looks interesting. But yeah, that's what I mean.

Yes, using geometric transformation is a straightforward way. In our framework, the two ways can achieve almost the same results. This part of the experiments will be updated in our next version.

As discussed in our paper, our proposed method is more flexible and simple.
Please refer to our paper for a detailed discussion. (the last of Sec 1.1 Related Work/Pre-training for dense prediction tasks)

Hi, thank you for your reply. Yeah, I get your point. I'm looking forward to the updated version.
The issue is closed.