detokenize output

Question

detokenize output

sonalsannigrahi opened this issue 3 years ago · comments

hello! I am just running some baseline tests on trained models and I was wondering if there is a script to detokenize the output. I have trained an en-ne model on BPE text as per the data provided and upon inference I have produced a pred.txt file run upon the test set and now I wanted to detokenize the output to compute BLEU scores.

also one more thing, what test files are we supposed to use? there are two directories: data and data-bin. Currently, I am using the wikipedia.test.ne-en.$L1 or $L2 files from data. Is this correct?

Sonal Sannigrahi · Answer 1 · Mon Mar 15 2021 05:00:33 GMT+0800 (China Standard Time)

(for those wondering: I found an old closed issue on the the test file locations and detokenized using the instruction from the sentence piece README)