thammegowda / mtdata

A tool that locates, downloads, and extracts machine translation corpora

Home Page:https://pypi.org/project/mtdata/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parallel Corpora for 6 Indian Languages

kpu opened this issue · comments

http://catalog.elra.info/en-us/repository/browse/ELRA-W0320/#

CC-BY-SA-3.0

Not sure why there isn't a download link from the main page, guess somebody needs to go in with an ELRA login, get it, and rehost.

I believe this is the data that we released in this paper? In that case, there is a more direct link. I'm not sure why ELRA has appropriated it with no mention or citation.

That said, the data was translated into English by English L2 speakers. The quality isn't great, though it might serve for translating out of English.

Wondering why ELRA didnt mention or cite the paper! The description looks a lot similar to the one described in the paper.
BTW, we have already added the joshua-decoder/indian-parallel-corpora corpus ( see mtdata list -id -g JoshuaDec).

url = 'https://github.com/joshua-decoder/indian-parallel-corpora/archive/a2cd1a99.tar.gz'