AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News & Hate Speech Detection Dataset

Description:

AraCOVID19-MFH Arxiv URL is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset. The dataset contains 10,828 Arabic tweets annotated with 10 different labels. AraCOVID19-MFH labels, values, and their signification are provided in the below Table:

An example of the instances present in the dataset are provided in the below Table:

Content:

Statistics about the number of tweets in each topic are provided in the below Table:

Usage:

The labels have been designed to consider some aspects relevant to the fact-checking task, such as the tweet's check worthiness, positivity/negativity, and factuality. Though the dataset is mainly designed for fake news detection, it can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.

Data Retrieval:

We provided (in the AraCOVID19-MFH_V1.0.csv file) only the user IDs following Twitter’s Terms of Service.

Tools such as Twarc or Hydrator can be used to retrieve the tweets using their IDs. In case of any problem you can contact the authors using the email provided in the below section.

License:

The AraCOVID19-MFH dataset is licensed under Creative Commons Attribution-Noncommercial-ShareAlike 4.0 CC BY-NC-SA 4.0.

Citations:

Please cite as:

@misc{ameur2021aracovid19mfh,
      title={AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset}, 
      author={Mohamed Seghir Hadj Ameur and Hassina Aliane},
      year={2021},
      eprint={2105.03143},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contacts:

To get additional information please contact mohamedhadjameur@gmail.com, ahassina@cerist.dz, or drdhn@cerist.dz

MohamedHadjAmeur / AraCOVID19-MFH