Code for paper: XXX
File Structure
nmt.py
: main filevocab.py
: script used to generate.bin
vocabulary file from parallel corpusutil.py
: script containing helper functions
Usage
See EXAMPLE_SETUP.md
-
Pytorch 0.4 is used.
-
Add expected_bleu module to PYTHONPATH
export PYTHONPATH="${PYTHONPATH}:/<PATH>/<TO>/<repo(bleu_lower_bound)>/expected_bleu"
If you just want to include this loss into your MT - see expected_bleu directory. If you want to reproduce read on.
- (See NOTE below) Run the script (borrowed from Harvard NLP repo) to download and preprocess IWSLT'14 dataset:
$ cd preprocessing
$ source prepareData.sh
NOTE: this script requires Lua and luaTorch. As an alternative, you can download all necessary files(data directory) from this repo or via this link
- Generate Vocabulary Files
python vocab.py
Example:
python vocab.py --train_src data/train.de-en.de --train_tgt data/train.de-en.en --output data/vocab.bin
- Vanilla Maximum Likelihood Training
. scripts/run_mle.sh
- BLEU LB train
. scripts/run_custom_train
- REINFORCE train
. scripts/run_custom_train3
Training (one experiment)
bash scripts/run_mle.sh <gpu_id>
bash scripts/run_custom_train.sh models/model_name <gpu_id>
bash scripts/test_mle.sh <path to model> <gpu_id> <mode_name>
<mode_name>
equals to test or train (depends on what dataset you want to evaluate).
See result in logs directory.
For multi experiments testing see scripts:
run_experiments.sh
- script which will run run_all.sh script for all gpus you’ll write in (see file content).
run_all.sh
- script for training (uncomment/comment different phases of training)