finetuning result is so bad when using my pretrained checkpoints on QVhighlights.

Question

finetuning result is so bad when using my pretrained checkpoints on QVhighlights.

fake-warrior8 opened this issue a year ago · comments

Hi, I used your pre-trained SeViLA localizer checkpoints on QVHighlights to fine-tune NExT-QA and got similar NExT-QA results as in your paper (73.2 vs. 73.8). However, when I used your script to first pretrain a sevila localizer and then finetune NExT-QA using my pretrained SeViLA localizer checkpoints, I got only an accuracy of 45 in the first epoch (71 using the checkpoints your gave). I found that your checkpoint is 815M and my pretrained sevila localizer on QVHighlights is 1.4G. Is there any post-processing for the pretrained sevila localizer checkpoints?

Xijun Wang · Answer 1 · Tue Jul 18 2023 05:16:01 GMT+0800 (China Standard Time)

Hi,

Did you solve this problem? I am in the same situation, and I got the pretrained sevila localizer on QVHighlights with 1.4G too.

Best

LDong · Answer 2 · Tue Jul 18 2023 09:18:57 GMT+0800 (China Standard Time)

Hi,

Did you solve this problem? I am in the same situation, and I got the pretrained sevila localizer on QVHighlights with 1.4G too.

Best

The pretrained ckpt includes BLIP-2 Q-former localizer parameters and some t5 parameters, while the downstream finetuning stage requires only the BLIP-2 Q-form localizer and a original BLIP-2 Q-former answerer parameters. You should combine the pretrained ckpt and the original BLIP-2 parameters to get a new ckpt for downstream finetuning.

Xijun Wang · Answer 3 · Tue Jul 18 2023 09:54:08 GMT+0800 (China Standard Time)

Thanks for the replying. Yes, I figured out by printing those keys. And I directly replaced the downloaded ckp's weights related to loc.