Materials for the useR! 2017 (Brussels) tutorial to be held on 4 July 2017. This tutorial will introduce the basic components of natural language processing and give users the tools to apply technique to their own data. Our focus is on explaining the why behind each component of the natural language pipeline in addition to the how.
If you would like to try to following along, clone this repository (or download as a zip file). It contains all of the required code, images, and data.
The tutorial is broken into the following parts:
- Preliminaries
- Topic 1: Exploratory Analysis of Tokenized Text
- Topic 2: The NLP Pipeline
- Topic 3: Modelling Textual Data
- What's Next
It is our intention to keep this repository up and accessible indefinitely. Please share and enjoy! We have a more gentle introduction of the same material over on the Programming Historian, and more in-depth coverage in our book Humanities Data in R. For questions please contact us at: @statsmaths and @nolauren (both twitter and GitHub).