about make Triplet dataset

Question

about make Triplet dataset

Kim-yonguk opened this issue 4 years ago · comments

Is this a way to make hard triplet online? Is it offline?

Tamer Tahamoqa · Answer 1 · Wed Dec 11 2019 19:19:52 GMT+0800 (China Standard Time)

I would say it is Online since you are only selecting the triplets in a batch that pass the hard-negatives triplet selection condition instead of pre-computing the number of triplets you want to train on that pass the condition by doing a full pass on the training set at the start of each epoch.

Please do keep in mind my understanding may be false. I think the triplet generation before training would not count as Offline as it is only randomly generating triplets and not pre-computing any embeddings that pass the triplet selection condition. I have used this triplet selection method from tbmoon's 'facenet' repository and edited it to provide a numpy file containing the generated triplets to provide some 'reproducibility' in experiments, but the general way I know of generating triplets is to randomly pick anchors, positives and negatives on the fly to prevent selection bias.

It seems you will need a large batch size to get better performance using the Triplet Loss method, so you will need a GPU with a large VRAM (24 GB or more preferably) or multiple GPUs in parallel. I think the original FaceNet paper used a batch size of 1800 triplets and enforced a certain number of images per each identity in their dataset (40 face images per identity) that contained hundreds of millions of face images and they used a Semi-Hard negative triplet selection method.

It seems doing only a normal cross-entropy loss classification on the VGGFace2 dataset using an Inception-ResNet-V1 model architecture like in David Sandberg's 'facenet' repository will yield better results with less instability during training, so giving that a shot wouldn't hurt.

If you find any more information please let me know.

Angel G. · Answer 2 · Wed Feb 17 2021 04:41:32 GMT+0800 (China Standard Time)

Before we compute the embeddings, it is not known whether the negative in the triplet selected is Hard, Semi-Hard or Easy. The random generation before a pass might yield many "Easy" triplets. When these are fed into a "large" "mini"-batch to be evaluated during training (and only the Hard/Semi-hard selected), then we call it "Online".