heyichang / LaKo

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection [ IJCKG 2022 ]

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LaKo

license arxiv badge

In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm.

Model Architecture

Model_architecture

Dependencies

Train

bash run_okvqa_train.sh

or try full training process to get the Attention signal for iterative training

bash run_okvqa_full.sh

Test

bash run_okvqa_test.sh

Note:

  • Please first pre-train LaKo (large version) on VQA2.0 then re-train on OKVQA for better performance.
  • you can open the .sh file for parameter modification.

Our code is based on FiD:

Cite:

Please condiser citing this paper if you use the code or data from our work. Thanks a lot :)

@article{DBLP:journals/corr/abs-2207-12888,
  author    = {Zhuo Chen and
               Yufeng Huang and
               Jiaoyan Chen and
               Yuxia Geng and
               Yin Fang and
               Jeff Z. Pan and
               Ningyu Zhang and
               Wen Zhang},
  title     = {LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text
               Injection},
  journal   = {CoRR},
  volume    = {abs/2207.12888},
  year      = {2022}
}

About

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection [ IJCKG 2022 ]

License:MIT License


Languages

Language:Python 93.8%Language:Shell 6.2%