google-research-datasets / clang8

cLang-8 is a dataset for grammatical error correction.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dataset languages

Bachstelze opened this issue · comments

There are many languages described in the paper.
Is this the dataset for all of them?

This repo contains the relabeled targets for English, German and Russian. For pre-training, we used a Common Crawl dataset with 101 languages.