Iterative Context-Aware Graph Inference for Visual Dialog

The overall framework of Context-Aware Graph.

This is a PyTorch implementation for Iterative Context-Aware Graph Inference for Visual Dialog, CVPR2020.

If you use this code in your research, please consider citing:

@InProceedings{Guo_2020_CVPR,
author = {Guo, Dan and Wang, Hui and Zhang, Hanwang and Zha, Zheng-Jun and Wang, Meng},
title = {Iterative Context-Aware Graph Inference for Visual Dialog},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Requirements

This code is implemented using PyTorch v0.3.1, and provides out of the box support with CUDA 9 and CuDNN 7.

Data

Download the VisDial v1.0 dialog json files and images from here.
Download the word counts file for VisDial v1.0 train split from here.
Use Faster-RCNN to extract image features from here.
Download pre-trained GloVe word vectors from here.

Training

Train the CAG model as:

python train/train_D_1.0.py --CUDA

Evaluation

Evaluation of a trained model checkpoint can be done as follows:

python eval/evaluate.py --model_path [path_to_root]/save/XXXXX.pth --cuda

This will generate an EvalAI submission file, and you can submit the json file to online evaluation server to get the result on v1.0 test-std.

Model	NDCG	MRR	R@1	R@5	R@10	Mean
CAG	56.64	63.49	49.85	80.63	90.15	4.11

Acknowledgements

This code began with jiasenlu/visDial.pytorch. We thank the developers for doing most of the heavy-lifting.

wh0330 / CAG_VisDial