Implementation of the concepts in the referenced paper, applied to German-English translation. Model is implemented as a 2-layer bidirectional GRU encoder followed by a 4-layer GRU decoder. Uses torchtext for data loading and spacy for tokenization.
Translation pairs obtained from the WMT 2016 Multi30k dataset.
Training loss, validation loss, and validation accuracy curves: