training MUTAN+Att using pytorch code achieve low accuracy

Question

training MUTAN+Att using pytorch code achieve low accuracy

gaopeng-eugene opened this issue 7 years ago · comments

Hi, thank you so much for your code.
Right now, I am trying to replicate your ICCV results with the pytorch implementation.
Here is the setting
'batch_size': None,
'dir_logs': None,
'epochs': None,
'evaluate': False,
'help_opt': False,
'learning_rate': None,
'path_opt': 'options/vqa/mutan_att_trainval.yaml',
'print_freq': 10,
'resume': '',
'save_all_from': None,
'save_model': True,
'st_dropout': None,
'st_fixed_emb': None,
'st_type': None,
'start_epoch': 0,
'vqa_trainsplit': 'train',
'workers': 16}

options

{'coco': {'arch': 'fbresnet152torch', 'dir': 'data/coco', 'mode': 'att'},
'logs': {'dir_logs': 'logs/vqa/mutan_att_trainval'},
'model': {'arch': 'MutanAtt',
'attention': {'R': 5,
'activation_q': 'tanh',
'activation_v': 'tanh',
'dim_hq': 310,
'dim_hv': 310,
'dim_mm': 510,
'dropout_hv': 0,
'dropout_mm': 0.5,
'dropout_q': 0.5,
'dropout_v': 0.5,
'nb_glimpses': 2},
'classif': {'dropout': 0.5},
'dim_q': 2400,
'dim_v': 2048,
'fusion': {'R': 5,
'activation_q': 'tanh',
'activation_v': 'tanh',
'dim_hq': 310,
'dim_hv': 620,
'dim_mm': 510,
'dropout_hq': 0,
'dropout_hv': 0,
'dropout_q': 0.5,
'dropout_v': 0.5},
'seq2vec': {'arch': 'skipthoughts',
'dir_st': 'data/skip-thoughts',
'dropout': 0.25,
'fixed_emb': False,
'type': 'BayesianUniSkip'}},
'optim': {'batch_size': 128, 'epochs': 100, 'lr': 0.0001},
'vqa': {'dataset': 'VQA',
'dir': 'data/vqa',
'maxlength': 26,
'minwcount': 0,
'nans': 2000,
'nlp': 'mcb',
'pad': 'right',
'samplingans': True,
'trainsplit': 'train'}}
Warning: 399/930911 words are not in dictionary, thus set UNK
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Model has 37840812 parameters

Here is the result after 100 epoch
Epoch: [99][1740/1760] Time 0.403 (0.412) Data 0.000 (0.007) Loss 0.8993 (0.9064) Acc@1 71.094 (73.912) Acc@5 94.531 (94.830)
Epoch: [99][1750/1760] Time 0.387 (0.412) Data 0.000 (0.007) Loss 0.8277 (0.9061) Acc@1 71.875 (73.915) Acc@5 95.312 (94.833)
Val: [900/950] Time 0.138 (0.188) Loss 3.1201 (2.8397) Acc@1 49.219 (52.236) Acc@5 75.000 (78.115)
Val: [910/950] Time 0.189 (0.187) Loss 2.4805 (2.8372) Acc@1 58.594 (52.240) Acc@5 80.469 (78.139)
Val: [920/950] Time 0.210 (0.187) Loss 2.8639 (2.8388) Acc@1 53.125 (52.226) Acc@5 77.344 (78.137)
Val: [930/950] Time 0.179 (0.187) Loss 2.1427 (2.8388) Acc@1 59.375 (52.227) Acc@5 82.031 (78.137)
Val: [940/950] Time 0.151 (0.187) Loss 3.1772 (2.8367) Acc@1 50.781 (52.263) Acc@5 72.656 (78.163)

Acc@1 52.266 Acc@5 52.266

gaopeng-eugene · Answer 1 · Thu Sep 21 2017 23:34:44 GMT+0800 (China Standard Time)

Here is the command line I use.

gaopeng-eugene · Answer 2 · Fri Sep 22 2017 00:36:47 GMT+0800 (China Standard Time)

python train.py --vqa_trainsplit train --path_opt options/vqa/mutan_att_train.yaml

gaopeng-eugene · Answer 3 · Fri Sep 22 2017 00:38:38 GMT+0800 (China Standard Time)

To summarize the result, I am training on train set and evaluating on val set.
MUTAN+Att is 53
MUTAN+No Att is 50

Remi · Answer 4 · Fri Sep 22 2017 04:11:58 GMT+0800 (China Standard Time)

You're looking at the val accuracy not the open ended val accuracy. The latter can be obtained using eval_res.py. This file is automatically executed after each training epoch : https://github.com/Cadene/vqa.pytorch/blob/master/train.py#L287

eval_res.py generates the open ended accuracy in a json file in the exp directory (logs). The open ended accuracy can be viewed using plotly : https://github.com/Cadene/vqa.pytorch#monitor-training

gaopeng-eugene · Answer 5 · Fri Sep 22 2017 08:07:26 GMT+0800 (China Standard Time)

Thank you so much for your quick reply. I will try your suggestion.
Another question when I read your ICCV paper :
In you paper, you compare with other method in No Attention and ensemble setting. Why not compare with Single Model Attention Setting?

gaopeng-eugene · Answer 6 · Fri Sep 22 2017 11:55:42 GMT+0800 (China Standard Time)

A small question, what is the difference between val accuracy and open ended val accuracy? As far as I know, there is two measurement in VQA, open ended accuracy and MC?

Remi · Answer 7 · Fri Sep 22 2017 16:27:54 GMT+0800 (China Standard Time)

Why not compare with Single Model Attention Setting?

It would have been a good idea, but we were really running out of time and place in the paper. So we focused on what we thought were the most important.

A small question, what is the difference between val accuracy and open ended val accuracy?

Look at the equation (13) in the paper:
"If the predicted answer appears at least 3 times in the ground
truth answers, the accuracy for this example is considered
to be 1. Intuitively, this metrics takes into account the consensus
between annotators."

As far as I know, there is two measurement in VQA, open ended accuracy and MC?

VQA OpenEnded and VQA MC are two different problems. MC stands for Multiple Choices (answers which are inputs).

gaopeng-eugene · Answer 8 · Fri Sep 22 2017 22:21:41 GMT+0800 (China Standard Time)

Thank you so much for your reply.