mxnet-vqa

Requirements

This code is written in Python and requires MXNET. The preprocssinng code is in Python.

Data Preprocessing

Here we list two preprocessed data using VQA v1.0 dataset

DATA ONE

Download preprocessed data from VQA_LSTM_CNN

Under Evaluation section, you can download the features. (Train on train set and evaluate on validation set) You will see three files in the folder: data_prepro.h5, data_prepro.json and data_img.h5. data_prepro.h5 contains questions and answers for train and test sets. data_prepro.json contains index map for all words in questions and answers. data_img.h5 contains image features using pretrained VGG19 network. Image feature size is 4096.

DATA TWO

Download original text datasets(annotation and question) from VQA and run

$ python textpreprocess.py

to get vqa_raw_train.json, vqa_raw_test.json and vqa_raw_val.json.

Once you have these, run

$ python prepro.py --input_train_json vqa_raw_train.json --input_val_json vqa_raw_val.json --input_test_json vqa_raw_test.json --num_ans 1000

to get the question features. --num_ans specifiy how many top answers you want to use during training. This will generate two files in your main folder, data_prepro.h5 and data_prepro.json. data_prepro.h5 contains questions and answers for train, validation and test sets. data_prepro.json contains index map for all words in questions and answers.

We use pretrained resnet-152 network to get the image features. Please refer to VQA-MCB. After preprocessing, you should have processed image features stored in .jpg.npz files. Image feature size is 2048*14*14

Training

Basic model

In the basic model, we just concatenate the text and image features. Reference paper is VQA:Visual Question Answering. (We use DATA ONE) Run

$ python basic_train.py

You can also change the model and only use text or image features for training.

Tensor Sketching model

In this model, we add higher order correlation between text and image features to the network. We use tensor sketching to preserve the higher order correlation and also keep reasonable computation complexity. Reference paper is Compact Bilinear Pooling. (We use DATA ONE) Please add thw two initializors in add_init_function.py to mxnet/python/mxnet/initializer.py and run

$ python ts_train.py

Tensor Sketching with Attention model

TO BE DONE........

Testing

After training, you should have saved the model parameters in a .params file. Here we just load the model. Run

$ python test.py

This will generate a .json file. Run

$ python s2i.py

to make the result file readable to VQA Evaluation Tool. Then you can use the VQA Evaluation Tool to evaluate.

shiyangdaisy23 / mxnet-vqa