Welcome to the VerMouth dataset repository! VerMouth is a dataset for the automatic generation of personalised responses to misleading claims online.
It was introduced in the paper Countering Misinformation via Emotional Response Generation presented at the EMNLP 2023 conference.
If you use the VerMouth datasets or any partial sections of it in your work, we kindly request to cite our original paper.
VerMouth dataset comprises ~12.000 entries. Each entry contains three elements:
- claim: factual statement under analysis;
- fact-checking article: the link to a journalistic document containing all the evidence needed to fact-check a claim;
- verdict: a short textual response to the claim which explains why it might be true or false.
- style: a label indicating the style or emotion expressed in the claim.
Starting from the FullFact dataset (Russo et al., 2023) we rewrote both the claims and the verdict according to a social communication style. To this end, we adopted the author-reviewer pipeline (Tekiroğlu et al., 2020) which combines instruction-based Large Language Models and human post-editing. A schema of our data collection strategy is depicted in the following image.
The final data were rewritten according to two styles:
- SMP style: it resembles the style employed on social media platforms, in particular, Twitter style.
- Emotional style: social media communication style with the addition of an emotional component. We adopted the six basic emotions from Paul Ekman, namely anger, surprise, disgust, enjoyment, fear, and sadness.
The following table presents the count of items for each subpart of the dataset.
emotional style | |||||||
---|---|---|---|---|---|---|---|
SMP-style | happiness | anger | fear | disgust | sadness | surprise | all emotions |
1838 | 1527 | 1590 | 1805 | 1675 | 1758 | 1797 | 10152 |
In the folder data
, we provide the VerMouth dataset partitioned in train, val, and test sets. Each entry of the dataset comprises an id, a claim, a verdict, a link to the FullFact fact-checking article, and a "style label" (SMP, anger, disgust, fear, sadness, happiness, disgust) — the different versions of a claim present the same id.
If you use the VerMouth dataset in your research, please cite the following paper:
@inproceedings{russo-etal-2023-countering,
title = "Countering Misinformation via Emotional Response Generation",
author = "Russo, Daniel and
Kaszefski-Yaschuk, Shane and
Staiano, Jacopo and
Guerini, Marco",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.703",
doi = "10.18653/v1/2023.emnlp-main.703",
pages = "11476--11492",
}
VerMouth can be used for research purposes and cannot be redistributed. Please cite the corresponding publication if you use it.
For any questions or inquiries, don't hesitate to get in touch with drusso@fbk.eu and guerini@fbk.eu