rbiswasfc/nmt

title

Neural Machine Translation (NMT)

-- Project Status: Refactor

In this project, I aim at building different sequence to sequence machine learning models for Spanish to English translation task using PyTorch. The Sequence to sequence or "Seq2Seq" is an end-to-end model comprising of two recurrent neural networks:

An Encoder: takes sentences in the source language as input and encodes them into finite dimensional context vectors;
A Decoder: uses the context vector as a seed from which the translated sentences are generated.

For the NMT task, I have adopted four different Seq2Seq architectures:

LSTM Encoder-Decoder
LSTM Encoder-Decoder with attention
Sub-word modelling with character level CNN
The Transformer model

Brief explanations of each approaches are provided in the following.

LSTM Encoder-Decoder Network

The architecture of LSTM Encoder-Decoder Network is illustrated below

A bi-directional LSTM is used as Encoder, since a word in a sentence can have a dependency on another word before or after it. The encoder encapsulates information in the source sentence into a context vector, which acts as initial hidden state for the Decoder network. The implementation of the model is elaborated in this Jupyter Notebook.

Remarks:

It is difficult to compress an arbitrary-length source sequence into a single fixed-size context vector. The issue can be mitigated by building Encoder with stacked LSTM layers: each layer’s outputs are the input sequence to the next layer.
Seq2Seq models are known to lose effectiveness on very long inputs, a consequence of the practical limits of LSTMs. Encoder-Decoder network with attention mechanism can help in capturing long term dependency.

LSTM Encoder-Decoder with Attention

Global Attention Model (Luong, et al. 2015)

Remarks

Needs better handling of tokens generated during translation
Cannot use full GPU acceleration due to auto-regressive nature

Sub-word modelling with character level CNN

Hybrid model. Better UNK handling.

Remarks

Computationally expensive

The Transformer model

tbc

Remarks

It can utilize full GPU acceleration

Training

Dataset

The tarin, dev and test dataset are located here. It contains the follwing files

Initialization and Optimization

Uniform initialization. Adam optimizer. The model is trained in GPU in google colab.

Evaluation

BLEU is used to evaluate the model performance. The following scores are achieved

LSTM
Attn
Sub-word
Transformer: Needs to train

To-do list

Train the transformer
Investigate different initialization
Build stacked LSTM encoder-decoder with attention
Plot attention and demonstrate alignment during translation
Learning rate scheduler
Use the model architecture for other open-source machine translation datasets

Credits

This project uses the public resources provided in CS224n: Natural Language Processing with Deep Learning Stanford / Winter 2019 course. I am thankful to the lecturers and TAs for offering such amazing contents. I believe this course is one of the best places to learn NLP in 2019.

Contact

Raja Biswas: rajjjabiswas@gmail.com

rbiswasfc / nmt

Neural Machine Translation (NMT)

-- Project Status: Refactor

LSTM Encoder-Decoder Network

Remarks:

LSTM Encoder-Decoder with Attention

Remarks

Sub-word modelling with character level CNN

Remarks

The Transformer model

Remarks

Training

Dataset

Initialization and Optimization

Evaluation

To-do list

Credits

Contact

About

Languages