uta-smile / TCL

code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about VQA fine-tuning

czy-orange opened this issue · comments

Hi Jinyu,
Thanks for sharing the code of the great work TCL. I have some questions about the code of model_vqa.py.
1. top k answers for each question, shouldn't the code be answer_ids[b] and answer_atts[b]?
2. use of text decoder, based on targets_ids = input_ids.masked_fill(input_ids == self.tokenizer.pad_token_id, -100), the input_ids are almost the same as targets_ids except the pad token id, so what's the point of calculating loss and generating the answer for the second time?

Thanks!