cdancette / rubi.bootstrap.pytorch

NeurIPS 2019 Paper: RUBi : Reducing Unimodal Biases for Visual Question Answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to reproduce the accuracy

erobic opened this issue · comments

Hi, I am getting low accuracy on VQACP2 val set.

Here is the log:

eval_epoch.epoch: 21
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.loss: 9.32
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.loss_mm_q: 4.51
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.loss_q: 4.81
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.accuracy_top1: 36.54
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.accuracy_top5: 73.24
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.accuracy_rubi_top1: 22.48
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.accuracy_rubi_top5: 73.94
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.accuracy_q_top1: 6.13
[S 2019-08-22 01:47:37] ...trap/engines/engine.py.120: eval_epoch.accuracy_q_top5: 47.23

What would be a good way to debug it? Maybe I should compare per type accuracies against the pre-trained model?

The baseline model isn't training well either:

[S 2019-08-22 13:44:22] ...trap/engines/engine.py.120: eval_epoch.epoch: 11
[S 2019-08-22 13:44:22] ...trap/engines/engine.py.120: eval_epoch.loss: 4.28
[S 2019-08-22 13:44:22] ...trap/engines/engine.py.120: eval_epoch.accuracy_top1: 21.74
[S 2019-08-22 13:44:22] ...trap/engines/engine.py.120: eval_epoch.accuracy_top5: 70.43

I just tested with pre-trained weights, and they only gave 35.36/23.05. I wonder if something happened with my features, I downloaded them from this link. Any chance you could provide your logs for same features?

For me, the performance of pre-trained model was similar to the model I trained:

[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.epoch: 0
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.loss: 9.30
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.loss_mm_q: 4.56
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.loss_q: 4.75
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.accuracy_top1: 35.36
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.accuracy_top5: 73.72
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.accuracy_rubi_top1: 23.05
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.accuracy_rubi_top5: 74.16
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.accuracy_q_top1: 6.27
[S 2019-08-22 17:12:43] ...trap/engines/engine.py.101: eval_epoch.accuracy_q_top5: 47.58

Hi @erobic ,

You should not look at the top1 accuracy to evaluate the model. We use the 'open ended' accuracy. It is not displayed, because it takes time to compute, so it is computed in the background.

You can display it using the compare module:

python -m rubi.compare_vqacp2_rubi -d your/exp/dir

Got it, thank you!