Questions about Enhanced Speaker
ZhuFengdaaa opened this issue · comments
You claim an enhanced version of Speaker in section 3.4.3. However, geographic information and actions are only used to calculate the weight of features in attention mechanism.
I have difficulty understanding why g,a
are not used to directly calculate the context. Could you provide some works related to the motivation of this design?
Thanks for pointing it out.
I used a trick "fused hidden state" in implementing the attention layer here:
Line 122 in 4c11585
Mathematically, it would "add" the information of query into the retrieved context vectors:
c = Att(query, {key})
out = FC([query, c])
Thus, the information of g, a
would be captured by the second LSTM.
I am sorry that I forget to mention it in the paper.