pollpredict

Sentimental analysis using tweets to determine future outcomes

This is an experimental project where I try to predict future outcomes, in this case, the 2019 Indian elections using the present sentiments depicted in recent tweets.

How this works?

For this case, six official twitter handles have been considered, namely:

BJP4India
narendramodi
INCIndia
RahulGandhi
NCPspeaks
PawarSpeaks I have considered 2 representative twitter handles for each of the three largest national political parties in India. I could have considered more but was largely limited by twitters apis(Details below)

Recent @mentions of the above twitter handles are fetched using the twitter api. (Problems faced using the twitter api are explained below)
These fetched tweets are then given a score based on the valence(Negative or positive psychological value) of the subject of the tweet.(text and emojis)
An aggregate of the scores is calculated for each twitter handle. This aggregate score depicts the public's cumulative "state" of mind about that particular political figure.
All the tweets along with the scores are depicted here: PollPRedict

How are tweets scored?

Tweets are scored based on the AFINN. Simply put, sentences are tokenized and checked for positive and negative words according to the AFINN dataset. Same applies to emojis.

Assumptions:

Twitter is the network of choice.
English is the language of choice. (However, other languages can be added later)

Problems:

Twitter has started limiting developers after the Cambridge Analytica debacle. So I was given a limited number of times I could request Twitter for data, plus, only a 100 tweets per request :-(.
Since I used the free tier of the api, I was not able to request unique @mentions from the get go. I was able to only request tweets which sent retweets, quoted tweets etc that I did not want to consider for this project. I wanted only unique twitter @mentions.
Sorting through retweets was a hassle since tweets from big political figures tend to get retweeted a lot(couple tens of thousands). However I figured out a way to get some more data out of the requests instead of being completely useless for this project.

Future increments:

Additional languages can be added.
Other social networks can be incorporated.
Data from other twitter handles can be used.
Add some way to deal with tweet shorthands and wrong spellings.
Live tweet filtering and scoring can be implemented.

crispycrispycrispy / pollpredict