Validation set is decoded multiple times for ngram level metrics
AmitMY opened this issue · comments
Amit Moryossef commented
Bug description
When training a model with --valid-metrics chrf ce-mean-words bleu-detok
it seems like the development set is translated fully twice, in order to calculate the chrf
and bleu-detok
separately.
How to reproduce
Train a model with --valid-metrics chrf ce-mean-words bleu-detok
Context
- Marian version: https://github.com/marian-nmt/marian-dev/tree/e8a1a2530fb84cbff7383302ebca393e5875c441 (browsermt, bergamot project)
Log
[2023-01-16 15:58:54] [valid] First sentence's tokens as scored:
[2023-01-16 15:58:54] [valid] Decoding validation set with SentencePieceVocab for scoring
[2023-01-16 15:58:54] [valid] Hyp: M 5 3 x 5 0 0 ......
[2023-01-16 15:58:54] [valid] Ref: M 5 5 1 x 5 6 0 .......
[2023-01-16 16:24:12] [valid] Ep. 1 : Up. 300 : chrf : 14.717 : new best
[2023-01-16 16:50:12] [valid] Ep. 1 : Up. 300 : chrf : 14.717 : new best
[2023-01-16 16:50:27] [valid] Ep. 1 : Up. 300 : ce-mean-words : 2.38221 : new best
[2023-01-16 17:16:28] [valid] Ep. 1 : Up. 300 : bleu-detok : 0 : new best
[2023-01-16 17:22:04] [valid] Ep. 1 : Up. 600 : chrf : 15.978 : new best
*The reason there's chrf
twice is a bug in browsermt, now fixed.