Dataset languages
Bachstelze opened this issue · comments
Wagtail commented
There are many languages described in the paper.
Is this the dataset for all of them?
Eric Malmi commented
This repo contains the relabeled targets for English, German and Russian. For pre-training, we used a Common Crawl dataset with 101 languages.