masakhane-io / masakhane-mt

Machine Translation for Africa

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fula Pulaar <-> English Resource (Sentence Pairs)

nikisix opened this issue · comments

Hi I would like to contribute a Pulaar translator model, but need pointed to the the sentence pairs. Can anyone help me out?

Hi @nikisix ! It looks like JW300 which we used as source for other languages does not include Pulaar. On the OPUS website you can look for other corpora: https://opus.nlpl.eu/ -- It lists CCAligned, Wikimedia, Ubuntu, QED for Fula, but I'm not sure if it's Pulaar. The CCAligned corpus was previously found (https://arxiv.org/abs/2103.12028) to contain mostly noise for Fula, so I would not recommend using it. Perhaps Wikimedia, Ubuntu or QED? These might be quite domain-specific though.

Haven't used those last sources you mention before. I did notice JW300 has code 'fub' for pular defined, but no supporting data files unfortunately.