rahmanidashti / ACQSurvey

[Official] A Survey on Asking Clarification Questions Datasets in Conversational Systems - ACL 2023

Home Page:https://arxiv.org/abs/2305.15933

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ACQ Survey

The officail GitHub repository of "A Survey on Asking Clarification Questions Datasets in Conversational Systems (ACL 2023)".

Abstract

The ability to understand a user’s underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users’ true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparability, from inconsistent use of data, distinct experimental setups and evaluation strategies. Therefore, in this paper, to assist the development of ACQs techniques, we comprehensively analyse the current ACQs research status, which offers a detailed comparison of publicly available datasets, and discusses the applied evaluation metrics, joined with benchmarks for multiple ACQs-related tasks. In particular, given a thorough analysis of the ACQs task, we discuss a number of corresponding research directions for the investigation of ACQs as well as the development of conversational systems.

Datasets

Conversational Search

Year Acronym Authors Title Venue Code & Dataset Leaderboard
2023 ClariT Yue et al. Towards asking clarification questions for information seeking on task-oriented dialogues Arxiv'23 ClariT -
2019 Qulac Aliannejadi et al. Asking clarifying questions in open-domain information-seeking conversations SIGIR'19 Qulac -
2021 ClariQ Aliannejadi et al. Building and evaluating open-domain dialogue corpora with clarifying questions EMNLP'21 ClariQ http://convai.io/
2021 TavakoliCQ Tavakoli et al. Analyzing clarification in asynchronous information-seeking conversations JASISI'21 TavakoliCQ -
2020 MIMICS Zamani et al. MIMICS: A Large-Scale Data Collection for Search Clarification CIKM'20 MIMICS -
2019 MANtIS Penha et al. Introducing MANtIS: a novel Multi-Domain Information Seeking Dialogues Dataset Arxiv'19 MANtIS -
2021 ClariQ-FKw Sekulić et al. Towards facet-driven generation of clarifying questions for conversational search ICTIR'21 ClariQ-FKw -
2018 MSDialog Qu et al. Analyzing and characterizing user intent in information-seeking conversations SIGIR'18 MSDialog -
2022 MIMICS-Duo Tavakoli et al. Mimics-duo: Offline & online evaluation of search clarification SIGIR'22 MIMICS-Duo -
2022 CAsT (Year 4) Owoicho et al. TREC CAsT 2022: Going Beyond User Ask and System Retrieve with Initiative and Response Generation TREC'22 CAsT -

Conversational Question Answering

Year Acronym Authors Title Venue Code & Dataset Leaderboard
2020 ClarQ Kumar and Black Clarq: A large-scale and diverse dataset for clarification question generation ACL'20 ClarQ -
2018 RaoCQ Rao and Daumé III Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information ACL'18 RaoCQ -
2019 AmazonCQ Rao and Daumé III Answer-based adversarial training for generating clarification questions ACL'19 AmazonCQ -
2019 CLAQUA Xu et al. Asking Clarification Questions in Knowledge-Based Question Answering EMNLP-IJCNLP'19 CLAQUA -

Semantic Representation

We leverage the t-distributed Stochastic Neighbor Embedding (i.e., t-SNE) method to visualize the semantic representation of clarification questions (semantic embeddings) for comparing Conversational search and Conversational Question Answering datasets.

Conversational QA Datasets Conversational Search Datasets

Tasks

T1: Clarification Need Prediction

T2: Asking Clarification Questions (Ranking CQs)

The requirements:

!pip install --upgrade python-terrier
!pip install  --upgrade git+https://github.com/cmacdonald/pyterrier_bert.git

We preprocess each dataset into train/val/test sets with query-question pairs:

ranking_approaches.py

T3: User Satisfaction with CQs

Contributing

We welcome contributions to update this repository with growing development of asking clarification question datasets as well as relevant techniques.

The contributions are suggested by following a common routine:

  1. repository fork and clone.
  2. add the updates and create pull requests (refer to github help page to get the detailed instructions.).

We will regularly check the issues and pull requests for updating this repository.

Citation

If you use our source code, dataset, and experiments for your research or development, please cite the following paper:

@inproceedings{rahmani2023acqsurvey,
  title={A Survey on Asking Clarification Questions Datasets in Conversational Systems},
  author={Hossein A. Rahmani, Xi Wang, Yue Feng, Qiang Zhang, Emine Yilmaz, Aldo Lipani},
  booktitle={The 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, July 9-14, 202},
  year={2023}
}

TODO

  • TREC CAst Dataset

Team

  • Hossein A. Rahmani (UCL)
  • Xi Wang (UCL)
  • Yue Feng (UCL)
  • Qiang Zhang (Zhejiang University)
  • Emine Yilmaz (UCL & Amazon)
  • Aldo Lipani (UCL)

Contact

If you have any questions, do not hesitate to contact us by hossein.rahmani.22@ucl.ac.uk or xi-wang@ucl.ac.uk, we will be happy to assist.

About

[Official] A Survey on Asking Clarification Questions Datasets in Conversational Systems - ACL 2023

https://arxiv.org/abs/2305.15933


Languages

Language:Jupyter Notebook 94.9%Language:Python 5.1%