batra-mlp-lab / visdial-rl

Thanks to your kindness, I managed to run your code.
By the way, here is one more question.
I ran

python evaluate.py -useGPU \
    -startFrom checkpoints/abot_rl_ep20.vd \
    -qstartFrom checkpoints/qbot_rl_ep20.vd \
    -evalMode dialog \
    -cocoDir /my/path/to/coco/images/ \
    -cocoInfo /my/path/to/coco.json \
    -beamSize 5

then implemented

cd dialog_output/
python -m http.server 8000

however, I found that the visualized captions were quite different from those on your pic
There were so many "UNK" in my result. Is it natural? Or not?
And can you tell me in what condition I could make similar results to yours?

Regarding why your dialog visualization does not match the figure in the README - the command had a missing line for giving as input the generated caption file instead of the GT one. 7f3e7e2 fixes this, the updated command should give a similar dialog visualization now.

Coming back to the UNKs in the ground truth captions - It seems like some of the UNKs at the start are for the word "a", which is odd because the same word is not an UNK elsewhere. This might be a preprocessing issue, will look into it.

Thanks for your quick response.
In addition to the question, I would like to ask you a minor setting with regard to this.
When you generated the figure in README, which "inference"(greedy or sample) did you choose?

visdial-rl/eval_utils/dialog_generate.py

Line 126 in 7f3e7e2

beamSize=beamSize, inference='greedy')

visdial-rl/eval_utils/dialog_generate.py

Line 130 in 7f3e7e2

beamSize=beamSize, inference='greedy')

@nirbhayjm I think the UNKs are produced when there is a capitalized alphabet in the caption or question and not specifically for the alphabet a.

So many UNK in captions