in models.py line 80 att1........ seems to be a duplicating calculating.
wbaiting opened this issue · comments
Thanks for you tutorial!
I found code in models.py:
line 80:
att1 = self.encoder_att(encoder_out) # (batch_size, num_pixels, attention_dim)
and line 203:
attention_weighted_encoding, alpha = self.attention(encoder_out[:batch_size_t], h[:batch_size_t])
in every loop, att1 will be calculated repeatly. Is there something wrong with att1?
I saw the implementation based on tensorflow.https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow/blob/master/model_tensorflow.py
Instead of this code,
att1 = self.encoder_att(encoder_out) # (batch_size, num_pixels, attention_dim)
att2 = self.decoder_att(decoder_hidden) # (batch_size, attention_dim)
att = self.full_att(self.relu(att1 + att2.unsqueeze(1))).squeeze(2)
he implemented the attention layer with the code below:
context_encode = context_encode + \
tf.expand_dims(tf.matmul(h, self.hidden_att_W), 1) + \
self.pre_att_b
context_encode = tf.nn.tanh(context_encode)