ju-resplande / askD

AskDocs: A medical QA dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


AskDocs: A medical QA dataset

https://en.wikipedia.org/wiki/Stethoscope
GitHub release (latest by date) GitHub GitHub Repo stars

ELI5 dataset adapted on Medical Questions (AskDocs) subreddit.

Getting Started

Train Valid Test External
en 24256 5198 5198 166804
pt 24256 5198 5198 166804

The dataset questions and answers span a period from January 2013 to December 2019.

We additionally translated to Portuguese and used external data from here, which is a binary classification dataset "a QNLI medical-like". We adapted to value 5 or 0.

Usage

Datasets 🤗

from datasets import load_dataset

data = load_dataset("ju-resplande/askD", split="train_pt")
# ['train_en', 'validation_en', 'test_en', 'external_en', 'train_pt', 'validation_pt', 'test_pt', 'external_pt']

Citing

@misc{Gomes20202,
  author = {GOMES, J. R. S.},
  title = {AskDocs: A medical QA dataset},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ju-resplande/askD}},
  commit = {42060c4402c460e174cbb75a868b429c554ba2b7}
}

Acknowledgments

@viniciusplo and @ruanchaves for giving the idea. 😃

About

AskDocs: A medical QA dataset

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 55.4%Language:Python 44.6%