Encoder seems not attend on time dimension as described on the paper

Question

Encoder seems not attend on time dimension as described on the paper

shanyu-sys opened this issue 5 years ago · comments

In the paper, it says

We apply an attention layer to convolutional layer’s output matrix over the time dimension, That is, we can view the matrix as a sequence of dc-dimensional vectors and the sequence length is Tc. We apply attention over the time dimension so that our model can select relative time across all time steps adaptively.

After examining the code, if I understand it correctly, I found that attention seems to be applied on the feature dimension but not the time dimension, as the output of softmax is of size en_conv_hidden_size.

Did I read it wrong？