Cadene / vqa.pytorch

Visual Question Answering in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

training MUTAN+Att using pytorch code achieve low accuracy

gaopeng-eugene opened this issue · comments

Hi, thank you so much for your code.
Right now, I am trying to replicate your ICCV results with the pytorch implementation.
Here is the setting
'batch_size': None,
'dir_logs': None,
'epochs': None,
'evaluate': False,
'help_opt': False,
'learning_rate': None,
'path_opt': 'options/vqa/mutan_att_trainval.yaml',
'print_freq': 10,
'resume': '',
'save_all_from': None,
'save_model': True,
'st_dropout': None,
'st_fixed_emb': None,
'st_type': None,
'start_epoch': 0,
'vqa_trainsplit': 'train',
'workers': 16}

options

{'coco': {'arch': 'fbresnet152torch', 'dir': 'data/coco', 'mode': 'att'},
'logs': {'dir_logs': 'logs/vqa/mutan_att_trainval'},
'model': {'arch': 'MutanAtt',
'attention': {'R': 5,
'activation_q': 'tanh',
'activation_v': 'tanh',
'dim_hq': 310,
'dim_hv': 310,
'dim_mm': 510,
'dropout_hv': 0,
'dropout_mm': 0.5,
'dropout_q': 0.5,
'dropout_v': 0.5,
'nb_glimpses': 2},
'classif': {'dropout': 0.5},
'dim_q': 2400,
'dim_v': 2048,
'fusion': {'R': 5,
'activation_q': 'tanh',
'activation_v': 'tanh',
'dim_hq': 310,
'dim_hv': 620,
'dim_mm': 510,
'dropout_hq': 0,
'dropout_hv': 0,
'dropout_q': 0.5,
'dropout_v': 0.5},
'seq2vec': {'arch': 'skipthoughts',
'dir_st': 'data/skip-thoughts',
'dropout': 0.25,
'fixed_emb': False,
'type': 'BayesianUniSkip'}},
'optim': {'batch_size': 128, 'epochs': 100, 'lr': 0.0001},
'vqa': {'dataset': 'VQA',
'dir': 'data/vqa',
'maxlength': 26,
'minwcount': 0,
'nans': 2000,
'nlp': 'mcb',
'pad': 'right',
'samplingans': True,
'trainsplit': 'train'}}
Warning: 399/930911 words are not in dictionary, thus set UNK
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Model has 37840812 parameters

Here is the result after 100 epoch
Epoch: [99][1740/1760] Time 0.403 (0.412) Data 0.000 (0.007) Loss 0.8993 (0.9064) Acc@1 71.094 (73.912) Acc@5 94.531 (94.830)
Epoch: [99][1750/1760] Time 0.387 (0.412) Data 0.000 (0.007) Loss 0.8277 (0.9061) Acc@1 71.875 (73.915) Acc@5 95.312 (94.833)
Val: [900/950] Time 0.138 (0.188) Loss 3.1201 (2.8397) Acc@1 49.219 (52.236) Acc@5 75.000 (78.115)
Val: [910/950] Time 0.189 (0.187) Loss 2.4805 (2.8372) Acc@1 58.594 (52.240) Acc@5 80.469 (78.139)
Val: [920/950] Time 0.210 (0.187) Loss 2.8639 (2.8388) Acc@1 53.125 (52.226) Acc@5 77.344 (78.137)
Val: [930/950] Time 0.179 (0.187) Loss 2.1427 (2.8388) Acc@1 59.375 (52.227) Acc@5 82.031 (78.137)
Val: [940/950] Time 0.151 (0.187) Loss 3.1772 (2.8367) Acc@1 50.781 (52.263) Acc@5 72.656 (78.163)

  • Acc@1 52.266 Acc@5 52.266

Here is the command line I use.

python train.py --vqa_trainsplit train --path_opt options/vqa/mutan_att_train.yaml

To summarize the result, I am training on train set and evaluating on val set.
MUTAN+Att is 53
MUTAN+No Att is 50

commented

You're looking at the val accuracy not the open ended val accuracy. The latter can be obtained using eval_res.py. This file is automatically executed after each training epoch : https://github.com/Cadene/vqa.pytorch/blob/master/train.py#L287

eval_res.py generates the open ended accuracy in a json file in the exp directory (logs). The open ended accuracy can be viewed using plotly : https://github.com/Cadene/vqa.pytorch#monitor-training

Thank you so much for your quick reply. I will try your suggestion.
Another question when I read your ICCV paper :
In you paper, you compare with other method in No Attention and ensemble setting. Why not compare with Single Model Attention Setting?

A small question, what is the difference between val accuracy and open ended val accuracy? As far as I know, there is two measurement in VQA, open ended accuracy and MC?

commented

Why not compare with Single Model Attention Setting?

It would have been a good idea, but we were really running out of time and place in the paper. So we focused on what we thought were the most important.

A small question, what is the difference between val accuracy and open ended val accuracy?

Look at the equation (13) in the paper:
"If the predicted answer appears at least 3 times in the ground
truth answers, the accuracy for this example is considered
to be 1. Intuitively, this metrics takes into account the consensus
between annotators."

As far as I know, there is two measurement in VQA, open ended accuracy and MC?

VQA OpenEnded and VQA MC are two different problems. MC stands for Multiple Choices (answers which are inputs).

Thank you so much for your reply.