US-election-2020-sentiment-analysis

Predicting US Presidential Election 2020 Result Using Twitter Sentiment Analysis with Python

Dataset creation

Using twitter API to scrape tweets

Copy “API Key”, “API Secret”, “Access Token”, and “Access Token Secret” to use as Oauth keys.
Setup Authentication with Twitter using tweepy package.
Extracting tweets for both Donald Trump and Joe Biden.

candidate_name = ['realDonaldTrump','JoeBiden']

replies_trump = []
replies_biden = []
for candidate in candidate_name:
    for tweet in tweepy.Cursor(api.search,q='to:'+candidate, result_type='recent',timeout=999999).items(10000):
        if candidate == "realDonaldTrump":
            replies_trump.append(tweet)
          
        elif candidate == "JoeBiden":'
            replies_biden.append(tweet)

converting the files into dataframe and to csv

biden_df = pd.DataFrame()
trump_df = pd.DataFrame()

df_names = ['biden_df','trump_df']

for tweet in replies_trump:
    row = {'user': tweet.user.screen_name, 'text': tweet.text.replace('\n', ' ')}
    trump_df = trump_df.append(row, ignore_index=True)
    trump_df.to_csv(r'data/trump_data.csv')
  
for tweet in replies_biden:
    row = {'user': tweet.user.screen_name, 'text': tweet.text.replace('\n', ' ')}
    biden_df = biden_df.append(row, ignore_index=True)
    biden_df.to_csv(r'data/biden_data.csv')

generated data is avaliabe in 'trump_data.csv' and 'biden_data.csv'.
both the dataset contains to 2 columns and 10000 rows.
- 'text' - this coloumn contains tweets containing '@realDonaldTrump' or '@JoeBiden' respectively.
- 'user' - this coloumn contains the username.

Data Analysis

Importing the datasets

Sentiment analysis using TextBlob

Polarity ranges from -1 to +1 and tells whether the text has negative sentiments or positive sentiments
polarity function returns the polarity of each tweet

def polarity(review):
  return TextBlob(review).sentiment.polarity
  
  Trump_reviews['polarity'] = Trump_reviews['text'].apply(polarity)
  Biden_reviews['polarity'] = Biden_reviews['text'].apply(polarity)

adding the tag of 'Positive', 'Negative' or 'Netural' according to the polarity

Trump_reviews['Expression'] = np.where(Trump_reviews['polarity']>0,'Positive','Negative')
Trump_reviews.loc[Trump_reviews.polarity == 0, 'Expression'] = 'Netural'
Trump_reviews.head()

Biden_reviews['Expression'] = np.where(Biden_reviews['polarity']>0,'Positive','Negative')
Biden_reviews.loc[Biden_reviews.polarity == 0, 'Expression'] = 'Netural'
Biden_reviews.head()

Visualizing to find Positive, Negative and Neutral

Droping all neutral data since they do not add value to the analysis

Trump_reviews.drop((Trump_reviews[Trump_reviews['polarity']==0]).index, inplace=True)
print(Trump_reviews.shape)
Biden_reviews.drop((Biden_reviews[Biden_reviews['polarity']==0]).index, inplace=True)
print(Biden_reviews.shape)

After droping the neutral data the I have an uneven dataset to balance out both datasets I make use of 'balanced_data' function.

def balanced_data(reviews,n):
  np.random.seed(10)
  drop = np.random.choice(reviews.index,n,replace=False)
  review_subset = reviews.drop(drop)
  return review_subset

Trump_subset = balanced_data(Trump_reviews,99)
print(Trump_subset.shape)

Biden_subset = balanced_data(Biden_reviews,300)
print(Biden_subset.shape)

After balancing the data we have 4000 rows in each dataset.

Data Visualization

Donald Trump
- From the below figure, one can easily interpret that polarity ranges from -1 to +1 and a larger number of people have positive reviews because it is mostly concentrated between 0 and 0.5.
- From below figure of boxplot, one can easily identify most of the polarity is concentrated between -0.25 to 0.50. So, it is basically showing only the concentration of polarity.
- Analyzing Most Positive and Most Negative replies
  - Note:- As per the insights I have gained by this project. 'TextBlob sentiment analyzer' is not efficient enough to detect the scarcastic comments. Since, it works on tokens of sentence and classify accordingly.
- Word clouds can be useful to find your customer's pain points in business purposes, I am using it to get insights of public opinion about the presidential candidate and most frequently used keywords by the citizens.
Joe Biden
- From the below figure, one can easily interpret that polarity ranges from -1 to +1 and a larger number of people have positive reviews because it is mostly concentrated between 0 and 0.5.
- From below figure of boxplot, one can easily identify most of the polarity is concentrated between -0.25 to 0.50. So, it is basically showing only the concentration of polarity.
- Analyzing Most Positive and Most Negative replies
  - Note:- As per the insights I have gained by this project. 'TextBlob sentiment analyzer' is not efficient enough to detect the scarcastic comments. Since, it works on tokens of sentence and classify accordingly.
- Word clouds can be useful to find your customer's pain points in business purposes, I am using it to get insights of public opinion about the presidential candidate and most frequently used keywords by the citizens.
People Sentiment
- From the below figures, it is very evident that Joe Biden is getting more positive replies as compare to negative reviews.
- The overall people sentiment is more favouralbe to Joe Biden over Donald Trump.
  - Note:- I am assuming the all the users are unique. Hence, I have note removed the users who commented on both Joe Biden & Donald Trump

jijopjames / US-election-2020-sentiment-analysis

US-election-2020-sentiment-analysis

Dataset creation

Data Analysis

Data Visualization

About

Languages