LaKo

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm.

Model Architecture

Dependencies

Python 3
PyTorch (>= 1.6.0)
Transformers (version 3.0.2)
NumPy

Train

bash run_okvqa_train.sh

or try full training process to get the Attention signal for iterative training

bash run_okvqa_full.sh

Test

bash run_okvqa_test.sh

Note:

you can open the .sh file for parameter modification.

Our code is based on FiD:

Distilling Knowledge from Reader to Retriever:https://arxiv.org/abs/2012.04584.
Github link to FiD

About

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

MIT License

Languages

Language:Python 89.0%Language:Shell 11.0%