NHS-R 2020 Text Analysis Workshop Materials
The github repo is found here: https://github.com/dleng2242/NHS-R_2020_TextAnalysis
Many companies have a large amount of data stored as text that is not being used effectively. In this introductory workshop we will show how you can get started with analysing text data, from simple manipulation through to sentiment analysis. By the end of the course attendees will have a good understanding of the techniques as well as how to implement them in R.
Simple Text Manipulation
- Regular Expressions
- Tidy Text Format
- Removing Stop Words and Stemming
- Word Clouds
- Tokenisation and n-grams
Sentiment Analysis
- Sentiment Lexicons
- Joining Sentiments to Documents
Word and Document Frequency
- Term Frequency - Inverse Document Frequency (TF-IDF)