LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering

This is the official repository for Retrieval Augmented Visual Question Answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

performance using GPT-3 pretrain

807660937 opened this issue · comments

Thank you for your great work.
And I wonder that how about the performance of RA-VQA using GPT-3 as pretrain model for answer generation.

Thanks for your interest. Sorry for the late reply since I was having holidays now.
Re your question, it is possible to use GPT-3 as the backbone model (you can simply modify the trainer to achieve this) since all features are based on texts. RAVQA loss can still apply though the answer generation model is not updated.

Given the performance reported in other papers (e.g. KAT), I expect the final performance to increase by 6% overall, with the internal knowledge offered by LLM, though we haven't tried GPT-3 in our project. This is partially because we are more interested in proposing an aspiring concept rather than relying on a particular backbone model.