in models.py line 80 att1........ seems to be a duplicating calculating.

Question

in models.py line 80 att1........ seems to be a duplicating calculating.

wbaiting opened this issue 4 years ago · comments

Thanks for you tutorial!
I found code in models.py:
line 80:

        att1 = self.encoder_att(encoder_out)  # (batch_size, num_pixels, attention_dim)

and line 203:

        attention_weighted_encoding, alpha = self.attention(encoder_out[:batch_size_t], h[:batch_size_t])

in every loop, att1 will be calculated repeatly. Is there something wrong with att1?
I saw the implementation based on tensorflow.https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow/blob/master/model_tensorflow.py

Instead of this code,

att1 = self.encoder_att(encoder_out)  # (batch_size, num_pixels, attention_dim)        
att2 = self.decoder_att(decoder_hidden)  # (batch_size, attention_dim)        
att = self.full_att(self.relu(att1 + att2.unsqueeze(1))).squeeze(2)

he implemented the attention layer with the code below:

    context_encode = context_encode + \                 
        tf.expand_dims(tf.matmul(h, self.hidden_att_W), 1) + \                 
        self.pre_att_b
    context_encode = tf.nn.tanh(context_encode)