rbiswasfc / nmt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

title tags
Neural Machine Translation using Encoder-Decoder Networks
NLP
PyTorch

Neural Machine Translation (NMT)

-- Project Status: Refactor

In this project, I aim at building different sequence to sequence machine learning models for Spanish to English translation task using PyTorch. The Sequence to sequence or "Seq2Seq" is an end-to-end model comprising of two recurrent neural networks:

  • An Encoder: takes sentences in the source language as input and encodes them into finite dimensional context vectors;
  • A Decoder: uses the context vector as a seed from which the translated sentences are generated.

For the NMT task, I have adopted four different Seq2Seq architectures:

  • LSTM Encoder-Decoder
  • LSTM Encoder-Decoder with attention
  • Sub-word modelling with character level CNN
  • The Transformer model

Brief explanations of each approaches are provided in the following.

The architecture of LSTM Encoder-Decoder Network is illustrated below

The basic LSTM Encoder-Decoder Network.

A bi-directional LSTM is used as Encoder, since a word in a sentence can have a dependency on another word before or after it. The encoder encapsulates information in the source sentence into a context vector, which acts as initial hidden state for the Decoder network. The implementation of the model is elaborated in this Jupyter Notebook.

Remarks:

  • It is difficult to compress an arbitrary-length source sequence into a single fixed-size context vector. The issue can be mitigated by building Encoder with stacked LSTM layers: each layer’s outputs are the input sequence to the next layer.

  • Seq2Seq models are known to lose effectiveness on very long inputs, a consequence of the practical limits of LSTMs. Encoder-Decoder network with attention mechanism can help in capturing long term dependency.

  • Global Attention Model (Luong, et al. 2015)

Remarks

  • Needs better handling of tokens generated during translation
  • Cannot use full GPU acceleration due to auto-regressive nature

Hybrid model. Better UNK handling.

Remarks

  • Computationally expensive

tbc

Remarks

  • It can utilize full GPU acceleration

Training

Dataset

The tarin, dev and test dataset are located here. It contains the follwing files

Initialization and Optimization

Uniform initialization. Adam optimizer. The model is trained in GPU in google colab.

Evaluation

BLEU is used to evaluate the model performance. The following scores are achieved

  • LSTM
  • Attn
  • Sub-word
  • Transformer: Needs to train

To-do list

  • Train the transformer
  • Investigate different initialization
  • Build stacked LSTM encoder-decoder with attention
  • Plot attention and demonstrate alignment during translation
  • Learning rate scheduler
  • Use the model architecture for other open-source machine translation datasets

Credits

This project uses the public resources provided in CS224n: Natural Language Processing with Deep Learning Stanford / Winter 2019 course. I am thankful to the lecturers and TAs for offering such amazing contents. I believe this course is one of the best places to learn NLP in 2019.

Contact

About


Languages

Language:JavaScript 98.7%Language:Jupyter Notebook 1.3%Language:Python 0.1%