Twitter Semisupervised Sentiment Analysis
This repository is made for the NLP course project - Apr 2018.
Dependencies:
Dataset:
A semisupervised sentiment analysis for tweets of a twitter account over time.
Steps:
- Collect all Tweets of an account in a
json
file with the following format:
{
"source": "Twitter for iPhone",
"text": "Some text",
"created_at": "Sun Jul 08 21:58:52 +0000 2018",
"retweet_count": 64399,
"favorite_count": 183994,
"is_retweet": false,
"id_str": "1016079192604139520"
}
- Use
NLTK
for Lemmatization and Tokenization. - Based on AFINN dataset, each word is given a score, from +5 (very positive) to -5 (very negative).
- Use
scikit-learn
to calculate precision, recall, and f1-score. - Use
Matplotlib
to plot a histogram of the sentiment analysis over time.