facebookresearch / flores

Facebook Low Resource (FLoRes) MT Benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Non-matching quotation marks in some dev/devtest sets

DCSaunders opened this issue · comments

There are a lot of double (or more than double) quotation marks in the Flores dev and devtest sets

E.g.:

grep '""' flores101_dataset/dev/*dev

The affected sentences seem to vary - eng.dev has none, tel.dev has 72.

While users can clean the files and the effect on evaluation is probably not too strong, it seemed worth flagging if there is ever a dataset update.