Visual Question Generation in Tensorflow

It's simple question generator based on visual content written in Tensorflow. The model is quite similar to GRNN in Generating Natural Questions About an Image but I use LSTM instead of GRU. It's quite similar to Google's new AI assistant Allo which can ask question based on image content. Since Mostafazadeh et al. does not released VQG dataset yet, we will use VQA dataset temporarily.

Requirement

Tensorflow, follow the official installation
python 2.7
OpenCV
VQA dataset, go to the dataset website

Data

We will use VQA dataset which contains over 760K questions. We simply follow the steps in original repo to download the data and do some preprocessing. After running their code you should acquire three files: data_prepro.h5, data_prepro.json and data_img.h5, put them in the root directory.

Usage

Train the VQG model:

python main.py --model_path=[where_to_save]

Demo VQG with single image: (you need to download pre-trained VGG19 here)

python main.py --is_train=False --test_image_path=[path_to_image] --test_model_path=[path_to_model]

Experiment Result

Model: How many zebras are in the picture ?

Model: Where is the chair ?

Problem: Since we use VQA dataset which is designed for challenge so its question must be relevant to image content. No wonder the model train from VQA can not ask natural questions like human. We will adapt VQG dataset once it release to ask more meaningful question.

Allo: Google AI Assistant

We also let Allo reply to these images. Here's the result.

TODO

Apply VQG dataset instead of VQA to ask more useful question.

Reference

Generating Natural Questions About an Image, ACL 2016.
show_and_tell.tensorflow
tensorflow-vgg

chingyaoc / VQG-tensorflow