The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation. The existing datasets for conversational question answering over KGs (single-turn/multi-turn) focus on question paraphrasing and provide only up to one answer verbalization. However, ParaQA contains 5000 question-answer pairs with a minimum of two and a maximum of eight unique paraphrased responses for each question. We complement the dataset with baseline models and illustrate the advantage of having multiple paraphrased answers through commonly used metrics such as BLEU and METEOR. ParaQA dataset is publicly available on a persistent URI for broader usage and adaptation in the research community.
Alongside the dataset, we provide the framework for generating multiple paraphrase response and the baseline models. Due to the free distributed license agreement, you can find them in another repository.
The dataset is under Attribution 4.0 International (CC BY 4.0)
Coming Soon!