RichardAbraham / Text_Analytics_NLP

“With great power comes great responsibility.”- Stan Lee. Here, I analyze social media's response, specifically Twitter, to the life around and after the death of Marvel's creative leader, Stan Lee.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TEXT ANALYTICS USING NLP (Natural Language Processing)

Web scraping - Twitter data

SUMMARY:

Stan Lee, one of America's most prolific comic book writers, died in Los Angeles at the age of ninety-five on November 12, 2018. Here, I aim to analyze social media's response, particularly Twitter, to his death.

PROJECT GOALS:

  • Explore tweets to reveal interesting insights about user activity after his death.
  • Build a machine learning model that is capable of accurately classifyig the sentiment of a tweet as either positive, neutral or negative.

PACKAGES USED:

  • Scikit-Learn, Numpy, Pandas, NLTK, Textblob, Matplotlib, and Tweepy among others.

MOTIVATION:

Stan Lee was an American comic book writer, editor, publisher, and producer. He rose through the ranks of a family-run business to become Marvel Comics' primary creative leader for two decades, leading its expansion from a small division of a publishing house to a multimedia corporation that dominated the comics industry. Lee was inducted into the comic book industry's Will Eisner Award Hall of Fame in 1994 and the Jack Kirby Hall of Fame in 1995. He received the NEA's National Medal of Arts in 2008.

As a fan of Marvel comics myself, I wanted to explore his life and work in greater detail using machine learning!

DATA COLLECTION:

Data for the analysis was collected through Twitter's public APIs. (How to extract tweets using Twitter's public APIs)

  • I used the following keywords to filter the extraction - Stan Lee, StanLee, Stanley Martin Lieber

PS: Adil Moujahid does a great job introducing Text Mining using Twitter's streaming API and Python

  • Refer to "Historical Tweets Extraction - Web Scrapping.ipynb" for steps to extract historical tweets as needed.

DESCRIPTIVE ANALYTICS (EDA)

  • Tools used include Python, Tableau, MS PowerBI

Top 5 Languages used to tweet

Top_5_lang

English comes in at #1 followed by Spanish

Time Series analysis displaying number of likes vs date of creation (at the time of his death):

#tweets following Stan Lee's death

We see a surge in activity after his death

Percent(%) distribution of content sources

% distribution of content sources

Majority of the tweets were made using a mobile device

Basemap displaying the location of tweets

basemap

SENTIMENT ANALYSIS

Wordcloud

wordcloud

Important words include:

  1. angeles
  2. awesome
  3. respect
  4. memorial

Percent(%) distribution of sentiments

sentiment_distribution

Majority of the tweets were of a positive sentiment

For more findings, please go to the "Images" folder.

FILE CONTENTS:

Text Analytics using NLP - Web Scrapping.ipynb: Contains coded steps undertaken to

  1. Extract the relevant tweets
  2. Pre-process and structure the data for analysis
  3. Carry out some descriptive analytics
  4. Perform sentiment analysis and build a model for sentiment classification
  • Logistic Regression performed the best with an accuracy of 98% and an average f1 score of 0.97

Please feel free suggest any improvements or to use any of the steps shown above and have fun coding!!

About

“With great power comes great responsibility.”- Stan Lee. Here, I analyze social media's response, specifically Twitter, to the life around and after the death of Marvel's creative leader, Stan Lee.


Languages

Language:Jupyter Notebook 100.0%