The simplest seq2seq implementation based on encoder-decoder using TensorFlow v1.3
- Ubuntu 16.04 x64
- Python 3.5
- TensorFlow 1.3
It is the simplest implementation for sequence to sequence model, and the fake datasets are generated by generateData.py which have vocabulary from a to z. The core idea is to predict a variable length target sequence given an input variable length source sequence. The architecture is showed as below:
Figure 1. Architecture of the model
- Run generateData.py to create random fake datasets: vocab.dat, input.dat, output.dat and pred_logs/groundtruth.dat. (Note: groundtruth.dat is the last 10% of the output.dat by default)
- Run encoderDecoder.py for training and generating predicted sequences in the folder pred_logs.
- Go into the folder pred_logs and run pytasas.py which will create a CER log file test_cer_tasas.log, then run drawCER.py to visualize the CER results.
In my work, I use a third part command line tool https://github.com/mauvilsa/htrsh to calculate character error rate, you can also use TensorFlow's build-in edit_distance function to do it.
Figure 2. CER of the testing datasets
There is something important to note here, when training the seq2seq model, the input sequence for decoder should be the groundtruth, but when testing the model with testing datasets, the input sequence for decoder should be generated by itself iteratively. But this method here is the simplest way, in my future work I will give Scheduled Sampling a try.
encoderDecoder.py uses the legacy API in TF, if you want to try new TF features, just run encoderDecoder_newAPI.py instead, but all the information will only be printed on the screen.
The architecture of model with Bahdanau attention is:
Figure 3. Model with Bahdanau attention
But in encoderDecoder_newAPI.py, I replace the BLSTM with normal GRU for the encoder part to make it as simple as possible.
Switch from legacy API to higher level wrappersAdd attention machanism