nazaninsbr / Persian-English-Code-mixed-Sentiment-Analysis

A Persian-English dataset for the task of code-mixed sentiment analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Persian-English Code-mixed Sentiment Analysis

The dataset includes 3,640 Persian-English code-mixed tweets labeled with sentiment values (positive, negative, neutral).

In compliance with Twitter's terms of service, we have only added the following fields:

  • Tweet IDs
  • searched keyword
  • labels (individual labels by the three annotators and the overall label calculated using max voting)

You can retrieve the texts by using Twitter's Developer API.

How to collect data from twitter: https://chatbotslife.com/crawl-twitter-data-using-30-lines-of-python-code-e3fece99450e

How to get tweets with specific IDs: https://stackoverflow.com/questions/28384588/twitter-api-get-tweets-with-specific-id

Research paper

You can find the full text of our research paper here: https://arxiv.org/abs/2102.12700

If you find this dataset useful in your research, please consider citing:

@article{sabri2021sentiment,
  title={Sentiment Analysis of Persian-English Code-mixed Texts},
  author={Sabri, Nazanin and Edalat, Ali and Bahrak, Behnam},
  journal={arXiv preprint arXiv:2102.12700},
  year={2021}
}

About

A Persian-English dataset for the task of code-mixed sentiment analysis