A Hebrew question and answering dataset in the style of SQuAD, based on articles scraped from Wikipedia. The dataset contains a few thousand crowdsource-annotated pairs of questions and answers, in a setting suitable for few-shot learning.
For more details and quality analysis, see the paper.
#Items | #Articles | #Paragraphs | |
---|---|---|---|
Train | 1792 | 295 | 565 |
Dev | 221 | 33 | 63 |
Test | 1025 | 165 | 319 |
Total | 3038 | 493 | 947 |
If you use ParaShoot in your research, please cite the ParaShoot paper:
@article{keren2021parashoot,
title={ParaShoot: A Hebrew Question Answering Dataset},
author={Keren, Omri and Levy, Omer},
journal={arXiv preprint arXiv:2109.11314},
year={2021}
}