2017 VQA Challenge Winner (CVPR'17 Workshop)

Prerequisites

For questions and answers, go to data/ folder and execute preproc.py directly.
You'll need to install the Stanford Tokenizer, follow the instructions in their page.
The tokenizing step may take up to 36 hrs to process the training questions (I have a Xeon E5 CPU already), write a pure java code to tokenize them should be a lot faster. (Since python nltk will call the java binding, and python is slow)
For image feature, slightly modify this code to convert tsv to a npy file coco_features.npy that contains a list of dictionaries with key being image id and value being the feature (shape: 36, 2048).
Download and extract GloVe to data/ folder as well.

Now we should be able to train, reassure that the data/ folder should now contain at least:

- glove.6B.300d.txt
- vqa_train_final.json
- coco_features.npy
- train_q_dict.p
- train_a_dict.p

(Update) For convenience, here is the link to tokenized questions vqa_train_toked.json and vqa_val_toked.json, make sure you run data/preproc.py to generate vqa_train_final.json, train_q_dict.p, etc.

Use default parameters:

python main.py --train

Train from a previous checkpoint:

python main.py --train --modelpath=/path/to/saved.pth.tar

Check out tunable parameters:

python main.py

python main.py --modelpath 'data/ads/save/model-sym-10.pth.tar' --eval --gpu 2 --sym True

This will generate result.json (validation set only), format is referred to vqa evaluation format.

tensorboard --logdir /u/rkdoshi/AdsVQA/data/ads/tb

The default classifier is softmax classifier, sigmoid multi-label classifier is also implemented but I can't train based on that.
Training for 50 epochs reach around 64.42% training accuracy.
For the output classifier, I did not use the pretrained weight since it's hard to retrieve so I followed eq. 5 in the paper.
To prepare validation data you need to uncomment some line of code in data/preproc.py.
coco_features.npy is a really fat file (34GB including train+val image features), you can split it and modify the data loading mechanisms in loader.py.
This code is tested with train = train and eval = val, no test data included.
Issues are welcome!