Question about ITM pretraining
EagleW opened this issue · comments
Qingyun Wang commented
Hi, @dandelin
I have some questions about ITM pre-training. For the pretraining ITM, how did you use itm loss and wpa loss? It seems that you use them separately:
ViLT/vilt/modules/vilt_utils.py
Lines 127 to 139 in 762fd39
Why not simply add up those two losses and backpropagate them together?
ViLT/vilt/modules/objectives.py
Lines 252 to 272 in 762fd39
I also have the same question as #48
Thank you!