CompVis / metric-learning-divide-and-conquer

Source code for the paper "Divide and Conquer the Embedding Space for Metric Learning", CVPR 2019

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why the need to divide feature space into d/K

avn3r-dn opened this issue · comments

Currently, the paper suggests dividing d dimensions into K feature spaces. d=128 K=8, feature embedding per cluster is 16. I understand this was mainly done to keep fair comparison of embedding size among against other models. But theoretically, can we just have a new embedding space be d*k= 1024. If we can have you experimented how the two will compare? 128d vs 1028d one uses d/k feature space per cluster the other just uses the entire embedding space per cluster.

Regards.

Hi,

The embedding space of 1024d would obviously give better results due to the larger capacity.

Best,
Artsiom

Hey Artsiom

Sorry I didn't explain my question well enough. My question is: Why do we need to divide the feature space into K chunks. I understand the value of the K cluster and K losses but what do we gain from dividing the feature space into K chunks. The only gain I can see is keeping the feature space same as original.

Assuming number of features is not a constraint. Can't I just skip the splitting of features into K chunks. For example you use a 128d and divide into 16x8 features space which you then merge at the output. Can't I just use the 128d directly and then merge them into a 128dx8 dimensions?

Regards.

Yes, you can use 128dx8 and it is the same as splitting 1024d into 8 parts :). And as you pointed out the comparison to the 128d baselines would not be fair any more.
But if you compare now 128dx8 trained with the proposed method and the 1024d baseline, then our method should give better results.

Best,
Artsiom

The split of the embedding space into K chunks is to serve the purpose of making sampling easier. If you do not divide the embedding space into K chunks, you will have to resort to hard example mining, which do not result in comparable results(as most of the triplets are more or less useless).

@fwahhab89 we split the embedding space not only for the sampling but to make the learners less correlated.