Questions about running validate_lfw() function in train_triplets_loss.py

Question

Questions about running validate_lfw() function in train_triplets_loss.py

riverHu233 opened this issue 3 years ago · comments

Hello @tamerthamoqa
I use the validate_lfw() functions in my faceNet project, without changing anything, but when evaluating, it tooks almost 2 hours to calculate the distances and other metrics and still I didn't get a results, so the first question is I want to know if evaluating costs a lot of time, cause it computes on CPU instead of GPUs, and if it does, evaluate every epoch would costs, so I wonder how long does it take to train the whole model, it would very thankful if you can share me the training details so I can figure if there something uncorrect with my code.
Thanks Sincerely!

Tamer Tahamoqa · Answer 1 · Thu May 06 2021 20:03:43 GMT+0800 (China Standard Time)

Hello riverHu233,

For my system (Titan RTX), a training epoch of 5000 training iterations with 140x140 images with 544 triplets per batch would take around 2 hours 11 minutes with maximum performance mode, the LFW evaluation would take around 10-15 seconds. You could multiply by around 30 as an approximate for CPU computation time.

I am not sure on the reason why it is taking so long at your end to be honest.

P.S: I also have an SSD.

riverHu233 · Answer 2 · Thu May 06 2021 21:29:14 GMT+0800 (China Standard Time)

Thanks for your rely, there must be something wrong with my code, cause the hardware environment is almost same with yours, so maybe I should debug the code, Thanks again!

riverHu233 · Answer 3 · Fri May 07 2021 08:27:36 GMT+0800 (China Standard Time)

Thanks @tamerthamoqa , you're right, the validation only take 10 seconds in my code. It was the shape dismatch between distances (16200, ) and labels (16200, 1), but it didn't raise any error and the program just never ends.