amazon-mechanical-turk corpus dataset review-spam

Paraphrased OPinion Spam (POPS) Corpus v1.0

Overview

We introduce a novel dataset called Paraphrased OPinion Spam (POPS) contains a new type of review spam that imitates real human opinion reviews. For our POPS dataset, we have paid attention to the tendency that spammers often reference already written truthful reviews. Thus, we aimed to create a paraphrased dataset of 400 deceptive positive sentiment reviews using AMT Turkers. For the given task of creating POPS through AMT, we gave Turkers reviews of specific hotels and then provided guidelines for paraphrasing the given reviews that contain factual information of a hotel and real opinions and feelings of actual visitors of the hotel. We believe that introducing another type of review spam dataset that has not been studied will help to advance opinion spam research.

This corpus contains:

400 paraphrased deceptive (positive sentiment) hotel reviews generated by Amazon Mechanical Turk

Citation

If you download the dataset and plan to use them in your publications please cite the corresponding paper:

Seongsoon Kim, Seongwoon Lee, Donghyeon Park, and Jaewoo Kang. 2017. Constructing and Evaluating a Novel Crowdsourcing-based Paraphrased Opinion Spam Dataset. In Proceedings of the 26th International Conference on World Wide Web (WWW '17), 827-836. DOI: https://doi.org/10.1145/3038912.3052607

bibtex (DBLP)

Questions

If you have any questions about this dataset, please email us at the address below: seongkim@korea.ac.kr or seongwoon@korea.ac.kr

About

Paraphrased OPinion Spam (POPS) Corpus v1.0

amazon-mechanical-turk corpus dataset review-spam

Other