batra-mlp-lab / visdial-challenge-starter-pytorch

visdial-challenge-starter-pytorch/visdialch/data/dataset.py

Line 139 in 556def3

[dialog_round["answer"][:-1] for dialog_round in dialog]

Hi, the code here confuses me. Since 'dialog_round["answer"][:-1]' and 'dialog_round["answer"][1:]' ignore the last and the first word respectively, if the answer is one word, the 'answers_in' and 'answers_out' would be '0'. In this situation, the model would not learn anything from this sample.
Not sure if I am understanding this right, looking forward to your reply.
Thank you.

I think the implementation is correct.

In the case of generative decoding, we prepend and append start and end tokens respectively here:

visdial-challenge-starter-pytorch/visdialch/data/dataset.py

Line 102 in 556def3

if self.add_boundary_toks:

so dialog_round["answer"] will have 3 tokens (<START>, <ANSWER>, <END>) for single-word answers.

Discriminative decoding doesn't use answers_in and answers_out; it works with options:

visdial-challenge-starter-pytorch/visdialch/data/dataset.py

Line 194 in 556def3

options, option_lengths = self._pad_sequences(

visdial-challenge-starter-pytorch/visdialch/decoders/disc.py

Line 39 in 556def3

options = batch["opt"]

Let me know if this answers your query.

I get it, thank you so much! Your work is very cool!

The 'answer' would be 0 if the answer is one word