ParaShoot

A Hebrew question and answering dataset in the style of SQuAD, based on articles scraped from Wikipedia. The dataset contains a few thousand crowdsource-annotated pairs of questions and answers, in a setting suitable for few-shot learning.

For more details and quality analysis, see the paper.

Dataset Statistics

	#Items	#Articles	#Paragraphs
Train	1792	295	565
Dev	221	33	63
Test	1025	165	319
Total	3038	493	947

Citing

If you use ParaShoot in your research, please cite the ParaShoot paper:

@article{keren2021parashoot,
  title={ParaShoot: A Hebrew Question Answering Dataset},
  author={Keren, Omri and Levy, Omer},
  journal={arXiv preprint arXiv:2109.11314},
  year={2021}
}

About

A Hebrew Question Answering Dataset

Languages

Language:Python 97.6%Language:Shell 2.4%