Validation set is decoded multiple times for ngram level metrics

Question

Validation set is decoded multiple times for ngram level metrics

AmitMY opened this issue 2 years ago · comments

Bug description

When training a model with --valid-metrics chrf ce-mean-words bleu-detok it seems like the development set is translated fully twice, in order to calculate the chrf and bleu-detok separately.

How to reproduce

Train a model with --valid-metrics chrf ce-mean-words bleu-detok

Context

Marian version: https://github.com/marian-nmt/marian-dev/tree/e8a1a2530fb84cbff7383302ebca393e5875c441 (browsermt, bergamot project)

Log

[2023-01-16 15:58:54] [valid] First sentence's tokens as scored:
[2023-01-16 15:58:54] [valid] Decoding validation set with SentencePieceVocab for scoring
[2023-01-16 15:58:54] [valid]   Hyp: M 5 3 x 5 0 0 ......
[2023-01-16 15:58:54] [valid]   Ref: M 5 5 1 x 5 6 0 .......
[2023-01-16 16:24:12] [valid] Ep. 1 : Up. 300 : chrf : 14.717 : new best
[2023-01-16 16:50:12] [valid] Ep. 1 : Up. 300 : chrf : 14.717 : new best
[2023-01-16 16:50:27] [valid] Ep. 1 : Up. 300 : ce-mean-words : 2.38221 : new best
[2023-01-16 17:16:28] [valid] Ep. 1 : Up. 300 : bleu-detok : 0 : new best
[2023-01-16 17:22:04] [valid] Ep. 1 : Up. 600 : chrf : 15.978 : new best

*The reason there's chrf twice is a bug in browsermt, now fixed.