airsplay / R2R-EnvDrop

PyTorch Code of NAACL 2019 paper "Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about Enhanced Speaker

ZhuFengdaaa opened this issue · comments

You claim an enhanced version of Speaker in section 3.4.3. However, geographic information and actions are only used to calculate the weight of features in attention mechanism.

I have difficulty understanding why g,a are not used to directly calculate the context. Could you provide some works related to the motivation of this design?

Thanks for pointing it out.

I used a trick "fused hidden state" in implementing the attention layer here:

h_tilde = torch.cat((weighted_context, h), 1)
.

Mathematically, it would "add" the information of query into the retrieved context vectors:

c   = Att(query, {key})
out = FC([query, c])

Thus, the information of g, a would be captured by the second LSTM.

I am sorry that I forget to mention it in the paper.