suzana-ilic / misc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clean tweets: Text pre-processing

  1. Frequency counts of tokens
  2. Replace everything except alphanumeric characters, whitespace and periods
  3. Replace multiple whitespaces with one
  4. Replace usernames with generic: USERNAME
  5. Replace all urls
  6. Find all urls, enumerate and print
  7. Remove some emojis, keep some emojis šŸ¤£
  8. Replace punctuation
  9. Replace a token with another token
  10. Pandas

Basic plotting with matplotlib and seaborn

Exploratory lexicon analysis

"Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News" by Jacopo Staiano, Marco Guerini; published in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).

A highcoverage and high-precision lexicon of roughly 37,000 terms automatically annotated with emotion scores.

About


Languages

Language:Jupyter Notebook 100.0%