Embeddings are getting clustered together in a small region after training

Question

Embeddings are getting clustered together in a small region after training

Nrohlable opened this issue 2 years ago · comments

Thanks a lot for such a fantastic repo which we could use for our work.
I was recently working on building a Face verification system using Siamese network. I was using results of pretrained models of Casia webface dataset and VGG2 Face dataset and was able to achieve close to 90% accuracy on my dataset. Further I was using Hard triplet batching sample and training strategy to further fine tune the network but for some reason after training the Embeddings for all the images are being clustered together or in other words the distance of two embeddings corresponding to two persons are getting too close to each other for example earlier using the pre-trained models if for two embedding we were getting 0.45 as cosine distance after training using this triplet loss we were getting 0.006 and it doesn't change much for same person or different person.

If you could give me any insights on this, that would be helpful.
Thanks

Tamer Tahamoqa · Answer 1 · Mon Nov 28 2022 16:58:13 GMT+0800 (China Standard Time)

Hello @Nrohlable

I am assuming you used Triplet loss, which optimizes the embedding space for Euclidean Distance and not Cosine Distance. Does using Euclidean Distance instead of Cosine Distance also have similar results?

Nrohlable · Answer 2 · Mon Nov 28 2022 18:34:54 GMT+0800 (China Standard Time)

@tamerthamoqa thanks for responding.

Yes even if we use Euclidean distance the story doesn't changes much in that case as well
Earlier for one pair we were getting around 1.29 and afterwards for the same pair it was 0.07288.

Also earlier, with the pre-trained models the overlap of euclidean distance btw same person image and different person image was around 8% and after after this training it went to around 81%.

Is there any specific norm which I used maintain while training this kind of network, Like this network has to be trained for lets say 200 epochs in order get some valid result or something.