piperandrew / 255_IntroTextMiningR

Code for my course "Introduction to Literary Text Mining"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

255_IntroTextMiningR

Code for my course "Introduction to Literary Text Mining"

I recommend you take the following 3-5 hour free online course to familiarize yourself with R: https://www.udemy.com/course/r-basics/

In this course you will learn how to:

  • ingest a directory of any number of text files
  • understand things peculiar to text data like Zipf's law
  • understand vector space models and the idea of distributional semantics
  • make a document term matrix
  • manipulate features to allow you to run hypothesis tests on literary behaviour
  • discover distinctive words of different text collections
  • cluster texts to discover latent relationships between texts
  • use machine learning to classify unknown documents by known labels
  • use sentiment analysis to think about the emotional valence of stories
  • use topic modeling to better understand thematic focuses of different text collections
  • learn about expanded feature spaces such as parts-of-speech, characters, and literary spaces
  • create a final project where you test your beliefs about the behaviour of real world literary data

Along the way you will experiment with all sorts of different types of data:

  • long nineteenth century novels
  • Sherlock Holmes stories and other short stories
  • Fanfiction
  • learn what makes fictional writing distinctive from non-fiction
  • explore gender differences in literary writing

At a high level by the end of this course you will understand what it means to treat texts as data and learn some of the state-of-the-art tools that exist today for studying texts in an empirically grounded way. You will also become conversant in R, one of the two primary programming languages of data science today.

About

Code for my course "Introduction to Literary Text Mining"


Languages

Language:R 100.0%