seq2seq
Attention-based sequence to sequence learning
Dependencies
- TensorFlow 1.2+ for Python 3
- YAML and Matplotlib modules for Python 3:
sudo apt-get install python3-yaml python3-matplotlib
How to use
Train a model (CONFIG is a YAML configuration file, such as config/default.yaml
):
./seq2seq.sh CONFIG --train -v
Translate text using an existing model:
./seq2seq.sh CONFIG --decode FILE_TO_TRANSLATE --output OUTPUT_FILE
or for interactive decoding:
./seq2seq.sh CONFIG --decode
Example English→French model
This is the same model and dataset as Bahdanau et al. 2015.
config/WMT14/download.sh # download WMT14 data into raw_data/WMT14
config/WMT14/prepare.sh # preprocess the data, and copy the files to data/WMT14
./seq2seq.sh config/WMT14/baseline.yaml --train -v # train a baseline model on this data
You should get similar BLEU scores as these (our model was trained on a single Titan X I for about 4 days).
Dev | Test | +beam | Steps | Time |
---|---|---|---|---|
25.04 | 28.64 | 29.22 | 240k | 60h |
25.25 | 28.67 | 29.28 | 330k | 80h |
Download this model here. To use this model, just extract the archive into the seq2seq/models
folder, and run:
./seq2seq.sh models/WMT14/config.yaml --decode -v
Example German→English model
This is the same dataset as Ranzato et al. 2015.
config/IWSLT14/prepare.sh
./seq2seq.sh config/IWSLT14/baseline.yaml --train -v
Dev | Test | +beam | Steps |
---|---|---|---|
28.32 | 25.33 | 26.74 | 44k |
The model is available for download here.
Features
- YAML configuration files
- Beam-search decoder
- Ensemble decoding
- Multiple encoders
- Hierarchical encoder
- Bidirectional encoder
- Local attention model
- Convolutional attention model
- Detailed logging
- Periodic BLEU evaluation
- Periodic checkpoints
- Multi-task training: train on several tasks at once (e.g. French->English and German->English MT)
- Subwords training and decoding
- Input binary features instead of text
- Pre-processing script: we provide a fully-featured Python script for data pre-processing (vocabulary creation, lowercasing, tokenizing, splitting, etc.)
- Dynamic RNNs: we use symbolic loops instead of statically unrolled RNNs. This means that we don't mean to manually configure bucket sizes, and that model creation is much faster.
Credits
- This project is based on TensorFlow's reference implementation
- We include some of the pre-processing scripts from Moses
- The scripts for subword units come from github.com/rsennrich/subword-nmt