Could you provide the pretrain log? Thanks

Question

Could you provide the pretrain log? Thanks

longkukuhi opened this issue 2 years ago · comments

Thanks for the great paper and codes. Could you provide the pretrain log so that I can compare it with my result?

viyjy · Answer 1 · Thu Oct 27 2022 02:11:54 GMT+0800 (China Standard Time)

Hi, thanks for your interest in our work. I just had a quick look but did found it. Let me double check with team members later and will let you know.

viyjy · Answer 2 · Fri Oct 28 2022 14:57:38 GMT+0800 (China Standard Time)

Please check the log below. Note that train_lr doesn't show the full precision.

{"train_lr": "0.000", "train_loss_mlm": "2.017", "train_loss_ita": "2.352", "train_loss_itm": "0.451", "epoch": 0}
{"train_lr": "0.000", "train_loss_mlm": "1.364", "train_loss_ita": "2.032", "train_loss_itm": "0.347", "epoch": 1}
{"train_lr": "0.000", "train_loss_mlm": "1.303", "train_loss_ita": "1.949", "train_loss_itm": "0.316", "epoch": 2}
{"train_lr": "0.000", "train_loss_mlm": "1.265", "train_loss_ita": "1.900", "train_loss_itm": "0.300", "epoch": 3}
{"train_lr": "0.000", "train_loss_mlm": "1.236", "train_loss_ita": "1.857", "train_loss_itm": "0.288", "epoch": 4}
{"train_lr": "0.000", "train_loss_mlm": "1.214", "train_loss_ita": "1.825", "train_loss_itm": "0.279", "epoch": 5}
{"train_lr": "0.000", "train_loss_mlm": "1.193", "train_loss_ita": "1.797", "train_loss_itm": "0.271", "epoch": 6}
{"train_lr": "0.000", "train_loss_mlm": "1.176", "train_loss_ita": "1.782", "train_loss_itm": "0.263", "epoch": 7}
{"train_lr": "0.000", "train_loss_mlm": "1.159", "train_loss_ita": "1.764", "train_loss_itm": "0.257", "epoch": 8}
{"train_lr": "0.000", "train_loss_mlm": "1.142", "train_loss_ita": "1.741", "train_loss_itm": "0.252", "epoch": 9}
{"train_lr": "0.000", "train_loss_mlm": "1.125", "train_loss_ita": "1.729", "train_loss_itm": "0.246", "epoch": 10}
{"train_lr": "0.000", "train_loss_mlm": "1.111", "train_loss_ita": "1.712", "train_loss_itm": "0.241", "epoch": 11}
{"train_lr": "0.000", "train_loss_mlm": "1.095", "train_loss_ita": "1.696", "train_loss_itm": "0.236", "epoch": 12}
{"train_lr": "0.000", "train_loss_mlm": "1.080", "train_loss_ita": "1.683", "train_loss_itm": "0.231", "epoch": 13}
{"train_lr": "0.000", "train_loss_mlm": "1.066", "train_loss_ita": "1.679", "train_loss_itm": "0.226", "epoch": 14}
{"train_lr": "0.000", "train_loss_mlm": "1.052", "train_loss_ita": "1.669", "train_loss_itm": "0.221", "epoch": 15}
{"train_lr": "0.000", "train_loss_mlm": "1.039", "train_loss_ita": "1.655", "train_loss_itm": "0.216", "epoch": 16}
{"train_lr": "0.000", "train_loss_mlm": "1.024", "train_loss_ita": "1.650", "train_loss_itm": "0.212", "epoch": 17}
{"train_lr": "0.000", "train_loss_mlm": "1.012", "train_loss_ita": "1.652", "train_loss_itm": "0.208", "epoch": 18}
{"train_lr": "0.000", "train_loss_mlm": "1.000", "train_loss_ita": "1.645", "train_loss_itm": "0.203", "epoch": 19}
{"train_lr": "0.000", "train_loss_mlm": "0.989", "train_loss_ita": "1.645", "train_loss_itm": "0.199", "epoch": 20}
{"train_lr": "0.000", "train_loss_mlm": "0.977", "train_loss_ita": "1.639", "train_loss_itm": "0.195", "epoch": 21}
{"train_lr": "0.000", "train_loss_mlm": "0.966", "train_loss_ita": "1.639", "train_loss_itm": "0.191", "epoch": 22}
{"train_lr": "0.000", "train_loss_mlm": "0.957", "train_loss_ita": "1.628", "train_loss_itm": "0.188", "epoch": 23}
{"train_lr": "0.000", "train_loss_mlm": "0.949", "train_loss_ita": "1.634", "train_loss_itm": "0.184", "epoch": 24}
{"train_lr": "0.000", "train_loss_mlm": "0.942", "train_loss_ita": "1.635", "train_loss_itm": "0.181", "epoch": 25}
{"train_lr": "0.000", "train_loss_mlm": "0.935", "train_loss_ita": "1.634", "train_loss_itm": "0.179", "epoch": 26}
{"train_lr": "0.000", "train_loss_mlm": "0.930", "train_loss_ita": "1.639", "train_loss_itm": "0.177", "epoch": 27}
{"train_lr": "0.000", "train_loss_mlm": "0.925", "train_loss_ita": "1.629", "train_loss_itm": "0.175", "epoch": 28}
{"train_lr": "0.000", "train_loss_mlm": "0.921", "train_loss_ita": "1.634", "train_loss_itm": "0.173", "epoch": 29}

longkuku · Answer 3 · Mon Oct 31 2022 17:39:19 GMT+0800 (China Standard Time)

Many thanks!

Yusong Hu · Answer 4 · Sat Nov 05 2022 16:36:15 GMT+0800 (China Standard Time)

Hello, I wonder is the log for 14M data or 4M data? I got a much higher MLM loss with 4M pretraining.

viyjy · Answer 5 · Sun Nov 06 2022 10:27:25 GMT+0800 (China Standard Time)

It comes from 4M data

Yusong Hu · Answer 6 · Sun Nov 06 2022 11:53:50 GMT+0800 (China Standard Time)

It comes from 4M data

Thanks for your reply. I got a log like this
{"train_lr": "0.000", "train_loss_mlm": "2.457", "train_loss_ita": "0.915", "train_loss_itm": "0.464", "epoch": 0}
hhhhh, let me check my code.

Yusong Hu · Answer 7 · Sun Nov 06 2022 17:19:53 GMT+0800 (China Standard Time)

It comes from 4M data
I checked my code and all is the same. I don't have all the data bacause some urls are lossed. Will it get so big gap?

Yusong Hu · Answer 8 · Sun Nov 06 2022 17:22:52 GMT+0800 (China Standard Time)

It comes from 4M data
I checked my code and all is the same. I don't have all the data bacause some urls are lossed. Will it get so big gap?

and the log magnitude is so different from Junnan showed in salesforce/ALBEF#71

viyjy · Answer 9 · Mon Nov 07 2022 01:33:46 GMT+0800 (China Standard Time)

are you talking about ITA loss or MLM loss?

Yusong Hu · Answer 10 · Mon Nov 07 2022 01:48:09 GMT+0800 (China Standard Time)

are you talking about ITA loss or MLM loss?

the defination of ITA loss is different and it's naturally different, I'm confused about the rapid descent of MLM loss and why I get a quite low ITA loss. Is it from of the difference in datasets?

viyjy · Answer 11 · Mon Nov 07 2022 02:03:06 GMT+0800 (China Standard Time)

dataset shouldn't be the root cause. My 4M dataset also misses some urls

Yusong Hu · Answer 12 · Mon Nov 07 2022 02:56:48 GMT+0800 (China Standard Time)

dataset shouldn't be the root cause. My 4M dataset also misses some urls

Oh , I find the mistake. Thanks for your reply. I'll try it again.

Yusong Hu · Answer 13 · Wed Nov 09 2022 16:38:02 GMT+0800 (China Standard Time)

dataset shouldn't be the root cause. My 4M dataset also misses some urls

Oh , I find the mistake. Thanks for your reply. I'll try it again.

And I still cannot get a right mlm with other losses are about equal. Is there any preprocess used for captions?

viyjy · Answer 14 · Thu Nov 10 2022 00:39:28 GMT+0800 (China Standard Time)

Have you tested your model performance? for example the zero-shot performance?