How to get the Tatoeba corpus used in Deepnet?
SefaZeng opened this issue · comments
zengxianfeng commented
Describe
I am collecting the corpus used in Deepnet but I can't find where to download the Tatoeba corpus. Is this what you used in the paper?
And I find the training data used in Deepnet is about 13B sentences, but it seems that M2M-100 only use 7.5 B sentences which are consists of CCMatrix and CCAlign only. So, as I understand it, it's not a fair comparison?