This repo contains tutorials covering understanding and implementing sequence-to-sequence (seq2seq) models using PyTorch 1.0 and TorchText 0.3 using Python 3.6.
If you find any mistakes or disagree with any of the explanations, please do not hesitate to submit an issue. I welcome any feedback, positive or negative!
To install PyTorch, see installation instructions on the PyTorch website.
To install TorchText:
pip install torchtext
We'll also make use of spaCy to tokenize our data. To install spaCy, follow the instructions here making sure to install both the English and German models with:
python -m spacy download en
python -m spacy download de
-
1 - Sequence to Sequence Learning with Neural Networks
This first tutorial covers the workflow of a PyTorch with TorchText seq2seq project. We'll cover the basics of seq2seq networks using encoder-decoder models, how to implement these models in PyTorch, and how to use TorchText to do all of the heavy lifting with regards to text processing. The model itself will be based off an implementation of Sequence to Sequence Learning with Neural Networks, which uses multi-layer LSTMs.
-
2 - Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Now we have the basic workflow covered, this tutorial will focus on improving our results. Building on our knowledge of PyTorch and TorchText gained from the previous tutorial, we'll cover a second second model, which helps with the "information compression" problem faced by encoder-decoder models. This model will be based off an implementation of Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, which uses GRUs.
-
3 - Neural Machine Translation by Jointly Learning to Align and Translate
Finally, we learn about attention by implementing Neural Machine Translation by Jointly Learning to Align and Translate. This further allievates the "information compression" problem by allowing the decoder to "look back" at the input sentence by creating context vectors that are weighted sums of the encoder hidden states. The weights for this weighted sum are calculated via an attention mechanism, where the decoder learns to pay attention to the most relevant words in the input sentence.