marian-nmt / marian

Fast Neural Machine Translation in C++

Home Page:https://marian-nmt.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About Parameters --after-epochs

LittleRooki opened this issue · comments

When I ran the Transformer example, I set the training parameter command as follow:
$MARIAN_TRAIN
--model model/model.npz --type transformer
--train-sets data/corpus.bpe.en data/corpus.bpe.de
--max-length 100
--vocabs model/vocab.ende.yml model/vocab.ende.yml
--mini-batch-fit -w 6000 --maxi-batch 1000
--early-stopping 10 --cost-type=ce-mean-words
--after-epochs 2
--valid-freq 5000 --save-freq 5000 --disp-freq 500
--valid-metrics ce-mean-words perplexity translation
--valid-sets data/valid.bpe.en data/valid.bpe.de
--valid-script-path "bash ./scripts/validate.sh"
--valid-translation-output data/valid.bpe.en.output --quiet-translation
--valid-mini-batch 64
--beam-size 6 --normalize 0.6
--log model/train.log --valid-log model/valid.log
--enc-depth 6 --dec-depth 6
--transformer-heads 8
--transformer-postprocess-emb d
--transformer-postprocess dan
--transformer-dropout 0.1 --label-smoothing 0.1
--learn-rate 0.0003 --lr-warmup 16000 --lr-decay-inv-sqrt 16000 --lr-report
--optimizer-params 0.9 0.98 1e-09 --clip-norm 5
--tied-embeddings-all
--devices $GPUS --sync-sgd --seed 1111
--exponential-smoothing

I add the command --after-epochs 2,but why did he start training for the Ep 3,at the same time ,error like this:

[2021-04-27 13:06:02] Starting data epoch 3 in logical epoch 3
[2021-04-27 13:06:02] Training finished
[2021-04-27 13:06:03] [valid] Ep. 3 : Up. 23421 : ce-mean-words : 2.08992 : new best
[2021-04-27 13:06:04] [valid] Ep. 3 : Up. 23421 : perplexity : 8.08427 : new best
[2021-04-27 13:06:11] [valid] Ep. 3 : Up. 23421 : translation : 21.72 : new best
[2021-04-27 13:06:12] Saving model weights and runtime parameters to model/model.npz.orig.npz
[2021-04-27 13:06:14] Saving model weights and runtime parameters to model/model.npz
[2021-04-27 13:06:16] Saving Adam parameters to model/model.npz.optimizer.npz

Error: Model file does not exist: model/model.iter23421.npz
Error: Aborted from void marian::ConfigValidator::validateOptionsTranslation() const in /home/caohang/marian/src/common/config_validator.cpp:57

What should I do?

Hm, check if the model folder exists?

No, ignore that. The other files were saved. That seems to be a problem with the translation validator. @snukky can you take a look?

No, ignore that. The other files were saved. That seems to be a problem with the translation validator. @snukky can you take a look?

Yes, model.iter5000.npz, model.iter10000.npz, model.iter15000.npz and model.iter20000.npz were saved, and I set epoch 2, I don't know why it showed :
[valid] Ep. 3 : Up. 23421 : ce-mean-words : 2.08992 : new best
[valid] Ep. 3 : Up. 23421 : perplexity : 8.08427 : new best
[valid] Ep. 3 : Up. 23421 : translation : 21.72 : new best

this might cause the problem

This is the final validation after training stopped. That's actually expected, but the error is weird.