Maple728 / MTNet

Tensorflow implementation of paper http://arxiv.org/abs/1809.02105

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encoder seems not attend on time dimension as described on the paper

shanyu-sys opened this issue · comments

In the paper, it says

We apply an attention layer to convolutional layer’s output matrix over the time dimension, That is, we can view the matrix as a sequence of dc-dimensional vectors and the sequence length is Tc. We apply attention over the time dimension so that our model can select relative time across all time steps adaptively.

After examining the code, if I understand it correctly, I found that attention seems to be applied on the feature dimension but not the time dimension, as the output of softmax is of size en_conv_hidden_size.

Did I read it wrong?