Clean tweets: Text pre-processing
- Frequency counts of tokens
- Replace everything except alphanumeric characters, whitespace and periods
- Replace multiple whitespaces with one
- Replace usernames with generic: USERNAME
- Replace all urls
- Find all urls, enumerate and print
- Remove some emojis, keep some emojis š¤£
- Replace punctuation
- Replace a token with another token
- Pandas
Basic plotting with matplotlib and seaborn
Exploratory lexicon analysis
"Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News" by Jacopo Staiano, Marco Guerini; published in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
A highcoverage and high-precision lexicon of roughly 37,000 terms automatically annotated with emotion scores.