Gary-code / KICNLE

[TIP 2024] The official code of paper "Knowledge-Augmented Visual Question Answering with Natural Language Explanation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KICNLE

This repository contains the official PyTorch implementation of paper "Knowledge-Augmented Visual Question Answering with Natural Language Explanation" for Transaction on Image Processing (TIP) 2024.

Overview

The KICNLE model enhances visual question answering by using an iterative method where each answer is refined based on the previous explanation. It includes a knowledge retrieval module to ensure relevant and accurate information. This results in high-quality, consistent answers and explanations closely tied to the visual content.

model

Installation

  • Install Anaconda or Miniconda distribution based on Python3.8
  • Main packages: PyTorch = 1.12, transformers = 4.30

Pre-trained Model

  • CLIP ViT-based model
pip install git+https://github.com/openai/CLIP.git

Training & Evaluation

  • For VQA-X dataset
python vqaX.py
  • For A-OKVQA dataset
python a_okvqa.py

About

[TIP 2024] The official code of paper "Knowledge-Augmented Visual Question Answering with Natural Language Explanation"


Languages

Language:Python 100.0%