facebookresearch / flores

Facebook Low Resource (FLoRes) MT Benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scope for addition of New Language Bodo

sanjibnarzary opened this issue · comments

Hi I am interested to translate by outsourcing 3001 sentences to our own language Bodo (brx). However I could not find all the sentences in English. Its only dev and devtest data available which is only around 2009 only. Where can I find the rest of the dataset sentences. So that I can start translating it.

Hi Sanjib, thank you for your interest in translating Flores. Of course, we're open to community contributions. Having Bodo would be great!!
For the moment, we're keeping the test set completely blind, as we use it to run evaluation campaigns. My suggestion: please start with dev and devtest. Once those are done, we can agree on additional Quality Assurance steps, and the translation for the blind test set. Ping us at flores@fb.com

@guzmanhe Thank you for the suggestion. Yes we will be starting to translate the dev and devtest data. We are using doccano for sequence to sequence translation and shared among all the translators. As there are very few experts who wants to participate it may take around 1-2 months. Once the translations of dev and devtest are done, we will surely ping to flores@fb.com.
Thank you once again for your quick and informative reply.