facebookresearch / flores

Facebook Low Resource (FLoRes) MT Benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

detokenize output

sonalsannigrahi opened this issue · comments

hello! I am just running some baseline tests on trained models and I was wondering if there is a script to detokenize the output. I have trained an en-ne model on BPE text as per the data provided and upon inference I have produced a pred.txt file run upon the test set and now I wanted to detokenize the output to compute BLEU scores.

also one more thing, what test files are we supposed to use? there are two directories: data and data-bin. Currently, I am using the wikipedia.test.ne-en.$L1 or $L2 files from data. Is this correct?

(for those wondering: I found an old closed issue on the the test file locations and detokenized using the instruction from the sentence piece README)