Incorrect implementation of Encoder-Decoder in machine translation

Question

Incorrect implementation of Encoder-Decoder in machine translation

harshtikuu opened this issue 6 years ago · comments

Hi, excellent tutorials but in an encoder decoder structure for machine translation I noticed you used the output of the encoder as the initial state for the decoder, but the correct implementation might be to use the hidden state of the encoder as the initial state of the decoder instead.

Hvass-Labs · Answer 1 · Sat Jul 14 2018 21:37:03 GMT+0800 (China Standard Time)

From the Python Notebook for Tutorial 21:

Note how the encoder uses the normal output from its last GRU-layer as the "thought vector". Research papers often use the internal state of the encoder's last recurrent layer as the "thought vector". But this makes the implementation more complicated and is not necessary when using the GRU. But if you were using the LSTM instead then it is necessary to use the LSTM's internal states as the "thought vector" because it actually has two internal vectors, which we would need to initialize the two internal states of the decoder's LSTM units.

There is no "correct" way of implementing this. I made a simpler and more graceful implementation than is typically done using the LSTM because it's quite complicated to pass around the internal states in TensorFlow / Keras.

Furthermore, when you use the output of the GRU instead of the internal state as the "thought vector", you actually get one more step of "learnable processing" before the "thought vector" is passed on to the decoder. Although I haven't tested whether that actually improves performance in practice.