shaked571 / ParaShoot

A Hebrew Question Answering Dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ParaShoot

A Hebrew question and answering dataset in the style of SQuAD, based on articles scraped from Wikipedia. The dataset contains a few thousand crowdsource-annotated pairs of questions and answers, in a setting suitable for few-shot learning.

For more details and quality analysis, see the paper.

Dataset Statistics

#Items #Articles #Paragraphs
Train 1792 295 565
Dev 221 33 63
Test 1025 165 319
Total 3038 493 947

Citing

If you use ParaShoot in your research, please cite the ParaShoot paper:

@article{keren2021parashoot,
  title={ParaShoot: A Hebrew Question Answering Dataset},
  author={Keren, Omri and Levy, Omer},
  journal={arXiv preprint arXiv:2109.11314},
  year={2021}
}

About

A Hebrew Question Answering Dataset


Languages

Language:Python 97.6%Language:Shell 2.4%