About Parameters --after-epochs
LittleRooki opened this issue · comments
When I ran the Transformer example, I set the training parameter command as follow:
$MARIAN_TRAIN
--model model/model.npz --type transformer
--train-sets data/corpus.bpe.en data/corpus.bpe.de
--max-length 100
--vocabs model/vocab.ende.yml model/vocab.ende.yml
--mini-batch-fit -w 6000 --maxi-batch 1000
--early-stopping 10 --cost-type=ce-mean-words
--after-epochs 2
--valid-freq 5000 --save-freq 5000 --disp-freq 500
--valid-metrics ce-mean-words perplexity translation
--valid-sets data/valid.bpe.en data/valid.bpe.de
--valid-script-path "bash ./scripts/validate.sh"
--valid-translation-output data/valid.bpe.en.output --quiet-translation
--valid-mini-batch 64
--beam-size 6 --normalize 0.6
--log model/train.log --valid-log model/valid.log
--enc-depth 6 --dec-depth 6
--transformer-heads 8
--transformer-postprocess-emb d
--transformer-postprocess dan
--transformer-dropout 0.1 --label-smoothing 0.1
--learn-rate 0.0003 --lr-warmup 16000 --lr-decay-inv-sqrt 16000 --lr-report
--optimizer-params 0.9 0.98 1e-09 --clip-norm 5
--tied-embeddings-all
--devices $GPUS --sync-sgd --seed 1111
--exponential-smoothing
I add the command --after-epochs 2,but why did he start training for the Ep 3,at the same time ,error like this:
[2021-04-27 13:06:02] Starting data epoch 3 in logical epoch 3
[2021-04-27 13:06:02] Training finished
[2021-04-27 13:06:03] [valid] Ep. 3 : Up. 23421 : ce-mean-words : 2.08992 : new best
[2021-04-27 13:06:04] [valid] Ep. 3 : Up. 23421 : perplexity : 8.08427 : new best
[2021-04-27 13:06:11] [valid] Ep. 3 : Up. 23421 : translation : 21.72 : new best
[2021-04-27 13:06:12] Saving model weights and runtime parameters to model/model.npz.orig.npz
[2021-04-27 13:06:14] Saving model weights and runtime parameters to model/model.npz
[2021-04-27 13:06:16] Saving Adam parameters to model/model.npz.optimizer.npz
Error: Model file does not exist: model/model.iter23421.npz
Error: Aborted from void marian::ConfigValidator::validateOptionsTranslation() const in /home/caohang/marian/src/common/config_validator.cpp:57
What should I do?
Hm, check if the model
folder exists?
No, ignore that. The other files were saved. That seems to be a problem with the translation
validator. @snukky can you take a look?
No, ignore that. The other files were saved. That seems to be a problem with the
translation
validator. @snukky can you take a look?
Yes, model.iter5000.npz, model.iter10000.npz, model.iter15000.npz and model.iter20000.npz were saved, and I set epoch 2, I don't know why it showed :
[valid] Ep. 3 : Up. 23421 : ce-mean-words : 2.08992 : new best
[valid] Ep. 3 : Up. 23421 : perplexity : 8.08427 : new best
[valid] Ep. 3 : Up. 23421 : translation : 21.72 : new best
this might cause the problem
This is the final validation after training stopped. That's actually expected, but the error is weird.