ai conversational-agents dataset deep-learning information-retrieval question-answering system

ACQ Survey

The officail GitHub repository of "A Survey on Asking Clarification Questions Datasets in Conversational Systems (ACL 2023)".

Abstract

The ability to understand a user’s underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users’ true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparability, from inconsistent use of data, distinct experimental setups and evaluation strategies. Therefore, in this paper, to assist the development of ACQs techniques, we comprehensively analyse the current ACQs research status, which offers a detailed comparison of publicly available datasets, and discusses the applied evaluation metrics, joined with benchmarks for multiple ACQs-related tasks. In particular, given a thorough analysis of the ACQs task, we discuss a number of corresponding research directions for the investigation of ACQs as well as the development of conversational systems.

Datasets

Conversational Search

Year	Acronym	Authors	Title	Venue	Code & Dataset	Leaderboard
2023	ClariT	Yue et al.	Towards asking clarification questions for information seeking on task-oriented dialogues	Arxiv'23	ClariT	-
2019	Qulac	Aliannejadi et al.	Asking clarifying questions in open-domain information-seeking conversations	SIGIR'19	Qulac	-
2021	ClariQ	Aliannejadi et al.	Building and evaluating open-domain dialogue corpora with clarifying questions	EMNLP'21	ClariQ	http://convai.io/
2021	TavakoliCQ	Tavakoli et al.	Analyzing clarification in asynchronous information-seeking conversations	JASISI'21	TavakoliCQ	-
2020	MIMICS	Zamani et al.	MIMICS: A Large-Scale Data Collection for Search Clarification	CIKM'20	MIMICS	-
2019	MANtIS	Penha et al.	Introducing MANtIS: a novel Multi-Domain Information Seeking Dialogues Dataset	Arxiv'19	MANtIS	-
2021	ClariQ-FKw	Sekulić et al.	Towards facet-driven generation of clarifying questions for conversational search	ICTIR'21	ClariQ-FKw	-
2018	MSDialog	Qu et al.	Analyzing and characterizing user intent in information-seeking conversations	SIGIR'18	MSDialog	-
2022	MIMICS-Duo	Tavakoli et al.	Mimics-duo: Offline & online evaluation of search clarification	SIGIR'22	MIMICS-Duo	-
2022	CAsT (Year 4)	Owoicho et al.	TREC CAsT 2022: Going Beyond User Ask and System Retrieve with Initiative and Response Generation	TREC'22	CAsT	-

Conversational Question Answering

Year	Acronym	Authors	Title	Venue	Code & Dataset	Leaderboard
2020	ClarQ	Kumar and Black	Clarq: A large-scale and diverse dataset for clarification question generation	ACL'20	ClarQ	-
2018	RaoCQ	Rao and Daumé III	Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information	ACL'18	RaoCQ	-
2019	AmazonCQ	Rao and Daumé III	Answer-based adversarial training for generating clarification questions	ACL'19	AmazonCQ	-
2019	CLAQUA	Xu et al.	Asking Clarification Questions in Knowledge-Based Question Answering	EMNLP-IJCNLP'19	CLAQUA	-

Semantic Representation

We leverage the t-distributed Stochastic Neighbor Embedding (i.e., t-SNE) method to visualize the semantic representation of clarification questions (semantic embeddings) for comparing Conversational search and Conversational Question Answering datasets.

Conversational QA Datasets	Conversational Search Datasets

Tasks

T1: Clarification Need Prediction

T2: Asking Clarification Questions (Ranking CQs)

The requirements:

!pip install --upgrade python-terrier
!pip install  --upgrade git+https://github.com/cmacdonald/pyterrier_bert.git

We preprocess each dataset into train/val/test sets with query-question pairs:

ranking_approaches.py

T3: User Satisfaction with CQs

Contributing

We welcome contributions to update this repository with growing development of asking clarification question datasets as well as relevant techniques.

The contributions are suggested by following a common routine:

repository fork and clone.
add the updates and create pull requests (refer to github help page to get the detailed instructions.).

We will regularly check the issues and pull requests for updating this repository.

Citation

If you use our source code, dataset, and experiments for your research or development, please cite the following paper:

@inproceedings{rahmani2023acqsurvey,
  title={A Survey on Asking Clarification Questions Datasets in Conversational Systems},
  author={Hossein A. Rahmani, Xi Wang, Yue Feng, Qiang Zhang, Emine Yilmaz, Aldo Lipani},
  booktitle={The 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, July 9-14, 202},
  year={2023}
}

TODO

TREC CAst Dataset

Team

Hossein A. Rahmani (UCL)
Xi Wang (UCL)
Yue Feng (UCL)
Qiang Zhang (Zhejiang University)
Emine Yilmaz (UCL & Amazon)
Aldo Lipani (UCL)

Contact

If you have any questions, do not hesitate to contact us by hossein.rahmani.22@ucl.ac.uk or xi-wang@ucl.ac.uk, we will be happy to assist.

About

[Official] A Survey on Asking Clarification Questions Datasets in Conversational Systems - ACL 2023

https://arxiv.org/abs/2305.15933

ai conversational-agents dataset deep-learning information-retrieval question-answering system

Languages

Language:Jupyter Notebook 94.9%Language:Python 5.1%