Twitter Analysis

This repo contains my work to extract, clean and apply Entity Extraction to the tweets from the current POTUS, Donald Trump.

Note that the entity extraction model is based upon this excellent tutorial.

Setup

Install git
Download & install Anaconda
Register for a Twitter access token
Clone this repo: git clone https://github.com/Tommo565/twitter-analysis
Navigate to the cloned folder: cd twitter-analysis
Create a directory for the entity recognition corpus: mkdir ner_corpus
Download the Gronigen Meaning Bank and unzip it into the ner_corpus folder
Install the dependencies: pip install -r requirements.txt
Open the core/secrets_example.py file with a text editor and update it with your twitter credentials
Rename the secrets_example.py file to secrets.py
Start the Jupyter Notebook: jupyter notebook
Away you go =)

Entity Extraction

What is Entity Extraction?

Entity Extraction (also sometimes called Entity Name Extraction or Named Entity Recognition) is the process of extracting real world entities (e.g. people, organisations, events, currencies, dates, times etc.) from text. Since text data is an unstructured data source, being able to identify the entities in the text is of real value to anyone looking to make sense of large volumes of text quickly and easily.

How does it work?

Since the ambiguities of language make it very difficult for machines to differentiate between all the possible meanings of a word (e.g. the word 'crane' could refer to a type of bird, a type of construction vehicle or the process of extending your neck), a simple keyword solution cannot be used.

Useful links & Further Reading

Generating an access token: https://www.slickremix.com/docs/how-to-get-api-keys-and-tokens-for-twitter/
Dive into NLTK: http://textminingonline.com/dive-into-nltk-part-iv-stemming-and-lemmatization
The NLTK.tag Stanford package: http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford
Download the Stanford Core NLP tools: https://stanfordnlp.github.io/CoreNLP/
Standford Core NLP Github: https://github.com/stanfordnlp/CoreNLP
Installing Core NLP: https://stanfordnlp.github.io/CoreNLP/cmdline.html
Sentiment Analysis on Trumps Tweets using Python: https://dev.to/rodolfoferro/sentiment-analysis-on-trumpss-tweets-using-python-
NLTK POS Tag list: https://pythonprogramming.net/natural-language-toolkit-nltk-part-speech-tagging/
Word Class descriptions: https://en.oxforddictionaries.com/grammar/word-classes-or-parts-of-speech
Flattening Trees: https://www.packtpub.com/books/content/python-text-processing-nltk-2-transforming-chunks-and-trees
Entity Extraction: http://nlpforhackers.io/named-entity-extraction/

te-565 / twitter-analysis