Resnet 18

Question

Resnet 18

glmanhtu opened this issue 3 years ago · comments

May I ask what is the reason you choose ResNet 18 for training? From my understanding, the more data we have, the deeper network we can use. For VGGFace2, it is about 3M images, therefore, I think ResNet 50 will be a better choice isn't it ?

Tamer Tahamoqa · Answer 1 · Sun Dec 13 2020 14:28:43 GMT+0800 (China Standard Time)

Hello glmanhtu,

It is mainly to be able to test the effect of large batch sizes on the training since large batch sizes tend to improve results with Triplet Loss and for the training time not to be too long as I am using my own PC for the training experiments. Even with a ResNet-18 the training time for 10 thousand training iterations with the current training settings takes around 2 hours and 50 minutes on a TITAN RTX with an additional 50 minutes for the triplet generation process with 16 python processes on an overclocked 9900-KF CPU. A ResNet-34 model would take a little more than double the training time for the same amount of iterations and would require a lower batch size to be able to be fit into the GPU's memory.

From earlier experiments I haven't noticed much difference between LFW results for different CNN architectures like the ResNet-18 and ResNet-34, but I haven't compared two difference architectures that have both finished a large amount of "epochs" so far. So it might be the case that a larger model would eventually achieve better performance.

I might do an experiment with a ResNet-34 model once I finish my current experiment with the ResNet-18, but I can't promise anything since I really need my PC for other things.

Vũ Mạnh Tú · Answer 2 · Sun Dec 13 2020 17:59:49 GMT+0800 (China Standard Time)

I see, however, people find that using a large batch size is not always give you a better performance,

It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize https://arxiv.org/pdf/1609.04836.pdf

Therefore, I think, decrease your batch size is one way to improve the performance of your model

Tamer Tahamoqa · Answer 3 · Sun Dec 13 2020 19:48:05 GMT+0800 (China Standard Time)

While this is true for most deep learning tasks. From what I have seen from previous experiments with Triplet Loss it seems that large batch sizes are necessary when it comes to Metric Learning methods like Triplet Loss, keep in mind that not all the triplets in the batch would be used in training as they need to meet a specific criteria to be valid e.g: semi-hard negative selection.

However, from what I have seen from Face Recognition model training loss functions, they appear to use large batch sizes. the FaceNet paper that presented the Triplet Loss method used a batch size of 1800 though I don't remember if it was 1800 total images per batch or 1800 triplets. And ArcFace which is considered the current state of the art method used a batch size of 512 in the paper.

It seems Metric Learning methods benefit from larger batch sizes, but I am not sure if that is also the case in Natural Language Processing.

Vũ Mạnh Tú · Answer 4 · Mon Dec 14 2020 17:56:11 GMT+0800 (China Standard Time)

Interesting,
Thanks