migalkin / SQuAD-es-mt

Spanish version of SQuAD 1.1 and 2.0 obtained via machine translation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SQuAD-es-mt

Spanish version of SQuAD 1.1 and 2.0 obtained via machine translation provided by Tilde.

This is one of the two es-SQuAD datasets used in the paper El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks. HI4NLP Workshop @ ECAI 2020.

Statistics

v1.1 v2.0
train 57280 56764
dev 7962 4530

Citation

If you find the dataset useful, please cite:

@inproceedings{Khvalchik2020ElDD,
  title={El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks},
  booktitle={Proceedings of HI4NLP Workshop at ECAI 2020},
  author={Maria Khvalchik and Mikhail Galkin},
  year={2020}
}

About

Spanish version of SQuAD 1.1 and 2.0 obtained via machine translation

License:GNU General Public License v3.0