Question about VQA fine-tuning

Question

Question about VQA fine-tuning

czy-orange opened this issue a year ago · comments

Hi Jinyu,
Thanks for sharing the code of the great work TCL. I have some questions about the code of model_vqa.py.
1. top k answers for each question, shouldn't the code be answer_ids[b] and answer_atts[b]?
2. use of text decoder, based on targets_ids = input_ids.masked_fill(input_ids == self.tokenizer.pad_token_id, -100), the input_ids are almost the same as targets_ids except the pad token id, so what's the point of calculating loss and generating the answer for the second time?

Thanks!