neural-machine-translation keras deep-learning sequence-to-sequence theano machine-learning nmt machine-translation lstm-networks gru tensorflow attention-mechanism web-demo transformer attention-is-all-you-need attention-model attention-seq2seq

NMT-Keras

Neural Machine Translation with Keras.

Library documentation: nmt-keras.readthedocs.io

Attentional recurrent neural network NMT model

Transformer NMT model

Features (in addition to the full Keras cosmos): .

❗ Multi-GPU training (only for Tensorflow).
Transformer model.
Tensorboard integration.
Online learning and Interactive neural machine translation (INMT). See the interactive NMT branch.
Attention model over the input sequence of annotations.
- Supporting Bahdanau (Add) Luong (Dot) attention mechanisms.
- Also supports double stochastic attention (Eq. 14 from arXiv:1502.03044)
Peeked decoder: The previously generated word is an input of the current timestep.
Beam search decoding.
Ensemble decoding (sample_ensemble.py).
- Featuring length and source coverage normalization (reference).
Translation scoring (score.py).
Model averaging (utils/model_average.py).
Support for GRU/LSTM networks:
- Regular GRU/LSTM units.
- Conditional GRU/LSTM units in the decoder.
- Multilayered residual GRU/LSTM networks (and their Conditional version).
Label smoothing.
N-best list generation (as byproduct of the beam search process).
Unknown words replacement (see Section 3.3 from this paper)
Use of pretrained (Glove or Word2Vec) word embedding vectors.
MLPs for initializing the RNN hidden and memory state.
Spearmint wrapper for hyperparameter optimization.
Client-server architecture for web demos:
- Regular NMT.
- Interactive NMT.
- Check out the demo!

Installation

Assuming that you have pip installed and updated (>18), run:

git clone https://github.com/lvapeab/nmt-keras
cd nmt-keras
pip install -e .

for installing the library.

Requirements

NMT-Keras requires the following libraries:

Our version of Keras (Recommended v. 2.0.7 or newer).
Multimodal Keras Wrapper (v. 2.0 or newer). (Documentation and tutorial).

For accelerating the training and decoding on CUDA GPUs, you can optionally install:

CuDNN.
CuPy.

For evaluating with additional metrics (Meteor, TER, etc), you can use the Coco-caption evaluation package and set METRICS='coco' in the config.py file. This package requires java (version 1.8.0 or newer).

Usage

Training

Set a training configuration in the config.py script. Each parameter is commented. See the documentation file for further info about each specific hyperparameter. You can also specify the parameters when calling the main.py script following the syntax Key=Value
Train!:

python main.py

Decoding

Once we have our model trained, we can translate new text using the sample_ensemble.py script. Please refer to the ensembling_tutorial for more details about this script. In short, if we want to use the models from the first three epochs to translate the examples/EuTrans/test.en file, just run:

 python sample_ensemble.py 
             --models trained_models/tutorial_model/epoch_1 \ 
                      trained_models/tutorial_model/epoch_2 \
             --dataset datasets/Dataset_tutorial_dataset.pkl \
             --text examples/EuTrans/test.en

Scoring

The score.py script can be used to obtain the (-log)probabilities of a parallel corpus. Its syntax is the following:

python score.py --help
usage: Use several translation models for scoring source--target pairs
       [-h] -ds DATASET [-src SOURCE] [-trg TARGET] [-s SPLITS [SPLITS ...]]
       [-d DEST] [-v] [-c CONFIG] --models MODELS [MODELS ...]
optional arguments:
    -h, --help            show this help message and exit
    -ds DATASET, --dataset DATASET
                            Dataset instance with data
    -src SOURCE, --source SOURCE
                            Text file with source sentences
    -trg TARGET, --target TARGET
                            Text file with target sentences
    -s SPLITS [SPLITS ...], --splits SPLITS [SPLITS ...]
                            Splits to sample. Should be already includedinto the
                            dataset object.
    -d DEST, --dest DEST  File to save scores in
    -v, --verbose         Be verbose
    -c CONFIG, --config CONFIG
                            Config pkl for loading the model configuration. If not
                            specified, hyperparameters are read from config.py
    --models MODELS [MODELS ...]
                            path to the models

Advanced features

Other features such as online learning or interactive NMT protocols are implemented in the interactiveNMT branch.

Resources

examples/documentation/nmt-keras_paper.pdf contains a general overview of the NMT-Keras framework.
In examples/documentation/neural_machine_translation.pdf you'll find an overview of an attentional NMT system.
In the examples folder you'll find 2 colab notebooks, explaining the basic usage of this library:
An introduction to a complete NMT experiment:
A dissected NMT model:
In the examples/configs folder you'll find two examples of configs for larger models.

Citation

If you use this toolkit in your research, please cite:

@article{nmt-keras:2018,
 journal = {The Prague Bulletin of Mathematical Linguistics},
 title = {{NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning}},
 author = {\'{A}lvaro Peris and Francisco Casacuberta},
 year = {2018},
 volume = {111},
 pages = {113--124},
 doi = {10.2478/pralin-2018-0010},
 issn = {0032-6585},
 url = {https://ufal.mff.cuni.cz/pbml/111/art-peris-casacuberta.pdf}
}

NMT-Keras was used in a number of papers:

Acknowledgement

Much of this library has been developed together with Marc Bolaños (web page) for other sequence-to-sequence problems.

To see other projects following the same philosophy and style of NMT-Keras, take a look to:

TMA: Egocentric captioning based on temporally-linked sequences.

VIBIKNet: Visual question answering.

ABiViRNet: Video description.

Sentence SelectioNN: Sentence classification and selection.

DeepQuest: State-of-the-art models for multi-level Quality Estimation.

Warning!

The Theano backend is not tested anymore, although it should work. There is a known issue with the Theano backend. When running NMT-Keras, it will show the following message:

[...]
raise theano.gof.InconsistencyError("Trying to reintroduce a removed node")
InconsistencyError: Trying to reintroduce a removed node

It is not a critical error, the model keeps working and it is safe to ignore it. However, if you want the message to be gone, use the Theano flag optimizer_excluding=scanOp_pushout_output.

Contact

Álvaro Peris (web page): lvapeab@prhlt.upv.es

About

Neural Machine Translation with Keras

http://nmt-keras.readthedocs.io

neural-machine-translation keras deep-learning sequence-to-sequence theano machine-learning nmt machine-translation lstm-networks gru tensorflow attention-mechanism web-demo transformer attention-is-all-you-need attention-model attention-seq2seq

MIT License

Languages

Language:Python 97.7%Language:Shell 2.3%