Is monolingual data used in the paper available for downloading?

xiamengzhou opened this issue · comments

Hi, I don't find any access to get the monolingual data used in the paper. Is there anyway I can access those?

Hi. You can download the data from the shared task webpage http://www.statmt.org/wmt19/parallel-corpus-filtering.html

Thanks! But it's like the common crawl monolingual data for sin and nep is not provided in the shared task webpage?

The commoncrawl data links have now been updated in the shared task webpage http://www.statmt.org/wmt19/parallel-corpus-filtering.html

Hi, I'm having trouble decompressing ”commoncrawl.deduped.en.xz“.

unxz: commoncrawl.deduped.en.xz: Unexpected end of input

I can decompress other files. Is there anything wrong with the file?