LaKo

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm.

Model Architecture

Dependencies

Python 3
PyTorch (>= 1.6.0)
Transformers (version 3.0.2)
NumPy
faiss-cpu

Train

bash run_okvqa_train.sh

or try full training process to get the Attention signal for iterative training

bash run_okvqa_full.sh

Test

bash run_okvqa_test.sh

Note:

Please first pre-train LaKo (large version) on VQA2.0 then re-train on OKVQA for better performance.
you can open the .sh file for parameter modification.

Our code is based on FiD:

Distilling Knowledge from Reader to Retriever:https://arxiv.org/abs/2012.04584.
Github link to FiD

Cite:

Please condiser citing this paper if you use the code or data from our work. Thanks a lot :)

@article{DBLP:journals/corr/abs-2207-12888,
  author    = {Zhuo Chen and
               Yufeng Huang and
               Jiaoyan Chen and
               Yuxia Geng and
               Yin Fang and
               Jeff Z. Pan and
               Ningyu Zhang and
               Wen Zhang},
  title     = {LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text
               Injection},
  journal   = {CoRR},
  volume    = {abs/2207.12888},
  year      = {2022}
}

About

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection [ IJCKG 2022 ]

MIT License

Languages

Language:Python 93.8%Language:Shell 6.2%