detokenize output
sonalsannigrahi opened this issue · comments
hello! I am just running some baseline tests on trained models and I was wondering if there is a script to detokenize the output. I have trained an en-ne model on BPE text as per the data provided and upon inference I have produced a pred.txt file run upon the test set and now I wanted to detokenize the output to compute BLEU scores.
also one more thing, what test files are we supposed to use? there are two directories: data and data-bin. Currently, I am using the wikipedia.test.ne-en.$L1 or $L2 files from data. Is this correct?
(for those wondering: I found an old closed issue on the the test file locations and detokenized using the instruction from the sentence piece README)