Could you provide the pretrain log? Thanks
longkukuhi opened this issue · comments
Thanks for the great paper and codes. Could you provide the pretrain log so that I can compare it with my result?
Hi, thanks for your interest in our work. I just had a quick look but did found it. Let me double check with team members later and will let you know.
Please check the log below. Note that train_lr
doesn't show the full precision.
{"train_lr": "0.000", "train_loss_mlm": "2.017", "train_loss_ita": "2.352", "train_loss_itm": "0.451", "epoch": 0}
{"train_lr": "0.000", "train_loss_mlm": "1.364", "train_loss_ita": "2.032", "train_loss_itm": "0.347", "epoch": 1}
{"train_lr": "0.000", "train_loss_mlm": "1.303", "train_loss_ita": "1.949", "train_loss_itm": "0.316", "epoch": 2}
{"train_lr": "0.000", "train_loss_mlm": "1.265", "train_loss_ita": "1.900", "train_loss_itm": "0.300", "epoch": 3}
{"train_lr": "0.000", "train_loss_mlm": "1.236", "train_loss_ita": "1.857", "train_loss_itm": "0.288", "epoch": 4}
{"train_lr": "0.000", "train_loss_mlm": "1.214", "train_loss_ita": "1.825", "train_loss_itm": "0.279", "epoch": 5}
{"train_lr": "0.000", "train_loss_mlm": "1.193", "train_loss_ita": "1.797", "train_loss_itm": "0.271", "epoch": 6}
{"train_lr": "0.000", "train_loss_mlm": "1.176", "train_loss_ita": "1.782", "train_loss_itm": "0.263", "epoch": 7}
{"train_lr": "0.000", "train_loss_mlm": "1.159", "train_loss_ita": "1.764", "train_loss_itm": "0.257", "epoch": 8}
{"train_lr": "0.000", "train_loss_mlm": "1.142", "train_loss_ita": "1.741", "train_loss_itm": "0.252", "epoch": 9}
{"train_lr": "0.000", "train_loss_mlm": "1.125", "train_loss_ita": "1.729", "train_loss_itm": "0.246", "epoch": 10}
{"train_lr": "0.000", "train_loss_mlm": "1.111", "train_loss_ita": "1.712", "train_loss_itm": "0.241", "epoch": 11}
{"train_lr": "0.000", "train_loss_mlm": "1.095", "train_loss_ita": "1.696", "train_loss_itm": "0.236", "epoch": 12}
{"train_lr": "0.000", "train_loss_mlm": "1.080", "train_loss_ita": "1.683", "train_loss_itm": "0.231", "epoch": 13}
{"train_lr": "0.000", "train_loss_mlm": "1.066", "train_loss_ita": "1.679", "train_loss_itm": "0.226", "epoch": 14}
{"train_lr": "0.000", "train_loss_mlm": "1.052", "train_loss_ita": "1.669", "train_loss_itm": "0.221", "epoch": 15}
{"train_lr": "0.000", "train_loss_mlm": "1.039", "train_loss_ita": "1.655", "train_loss_itm": "0.216", "epoch": 16}
{"train_lr": "0.000", "train_loss_mlm": "1.024", "train_loss_ita": "1.650", "train_loss_itm": "0.212", "epoch": 17}
{"train_lr": "0.000", "train_loss_mlm": "1.012", "train_loss_ita": "1.652", "train_loss_itm": "0.208", "epoch": 18}
{"train_lr": "0.000", "train_loss_mlm": "1.000", "train_loss_ita": "1.645", "train_loss_itm": "0.203", "epoch": 19}
{"train_lr": "0.000", "train_loss_mlm": "0.989", "train_loss_ita": "1.645", "train_loss_itm": "0.199", "epoch": 20}
{"train_lr": "0.000", "train_loss_mlm": "0.977", "train_loss_ita": "1.639", "train_loss_itm": "0.195", "epoch": 21}
{"train_lr": "0.000", "train_loss_mlm": "0.966", "train_loss_ita": "1.639", "train_loss_itm": "0.191", "epoch": 22}
{"train_lr": "0.000", "train_loss_mlm": "0.957", "train_loss_ita": "1.628", "train_loss_itm": "0.188", "epoch": 23}
{"train_lr": "0.000", "train_loss_mlm": "0.949", "train_loss_ita": "1.634", "train_loss_itm": "0.184", "epoch": 24}
{"train_lr": "0.000", "train_loss_mlm": "0.942", "train_loss_ita": "1.635", "train_loss_itm": "0.181", "epoch": 25}
{"train_lr": "0.000", "train_loss_mlm": "0.935", "train_loss_ita": "1.634", "train_loss_itm": "0.179", "epoch": 26}
{"train_lr": "0.000", "train_loss_mlm": "0.930", "train_loss_ita": "1.639", "train_loss_itm": "0.177", "epoch": 27}
{"train_lr": "0.000", "train_loss_mlm": "0.925", "train_loss_ita": "1.629", "train_loss_itm": "0.175", "epoch": 28}
{"train_lr": "0.000", "train_loss_mlm": "0.921", "train_loss_ita": "1.634", "train_loss_itm": "0.173", "epoch": 29}
Many thanks!
Hello, I wonder is the log for 14M data or 4M data? I got a much higher MLM loss with 4M pretraining.
It comes from 4M data
It comes from 4M data
Thanks for your reply. I got a log like this
{"train_lr": "0.000", "train_loss_mlm": "2.457", "train_loss_ita": "0.915", "train_loss_itm": "0.464", "epoch": 0}
hhhhh, let me check my code.
It comes from 4M data
I checked my code and all is the same. I don't have all the data bacause some urls are lossed. Will it get so big gap?
It comes from 4M data
I checked my code and all is the same. I don't have all the data bacause some urls are lossed. Will it get so big gap?
and the log magnitude is so different from Junnan showed in salesforce/ALBEF#71
are you talking about ITA loss or MLM loss?
are you talking about ITA loss or MLM loss?
the defination of ITA loss is different and it's naturally different, I'm confused about the rapid descent of MLM loss and why I get a quite low ITA loss. Is it from of the difference in datasets?
dataset shouldn't be the root cause. My 4M dataset also misses some urls
dataset shouldn't be the root cause. My 4M dataset also misses some urls
Oh , I find the mistake. Thanks for your reply. I'll try it again.
dataset shouldn't be the root cause. My 4M dataset also misses some urls
Oh , I find the mistake. Thanks for your reply. I'll try it again.
And I still cannot get a right mlm with other losses are about equal. Is there any preprocess used for captions?
Have you tested your model performance? for example the zero-shot performance?