The dataset includes 3,640 Persian-English code-mixed tweets labeled with sentiment values (positive, negative, neutral).
In compliance with Twitter's terms of service, we have only added the following fields:
- Tweet IDs
- searched keyword
- labels (individual labels by the three annotators and the overall label calculated using max voting)
You can retrieve the texts by using Twitter's Developer API.
How to collect data from twitter: https://chatbotslife.com/crawl-twitter-data-using-30-lines-of-python-code-e3fece99450e
How to get tweets with specific IDs: https://stackoverflow.com/questions/28384588/twitter-api-get-tweets-with-specific-id
You can find the full text of our research paper here: https://arxiv.org/abs/2102.12700
If you find this dataset useful in your research, please consider citing:
@article{sabri2021sentiment,
title={Sentiment Analysis of Persian-English Code-mixed Texts},
author={Sabri, Nazanin and Edalat, Ali and Bahrak, Behnam},
journal={arXiv preprint arXiv:2102.12700},
year={2021}
}