lhehnke / text-mining-literature

Scripts for processing and mining (classic) literature and other text data, such as screenplays

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text-mining-literature

Scripts for processing and mining (classic) literature and PDF files

Description: text_mining_dracula

The script covers

  • downloading and processing public domain works in the Project Gutenberg collection with gutenbergr
  • transforming works into a tidy format
  • mining works by
    • calculating and plotting word frequencies
    • plotting word and comparison clouds
    • conducting sentiment analyses (nrc)

using the example of Bram Stoker's Dracula.

Description: text_mining_the_room

Corresponding blog post: https://lhehnke.github.io/notes/2018/01/25/text_mining_the_room

The script covers

  • downloading, importing and processing PDF files in R
  • transforming PDF files into a tidy format
  • mining PDF files by
    • calculating and plotting word frequencies
    • conducting sentiment analyses (nrc; bing)
    • plotting word and comparison clouds
    • visualizing the most frequent positive and negative words (bing sentiments)

using the script of The Room a.k.a. the worst film ever made (directed, produced, written by and starring Tommy Wiseau).

Source: https://theroomscriptblog.files.wordpress.com/2016/04/the-room-original-script-by-tommy-wiseau.pdf

Example plot:

About

Scripts for processing and mining (classic) literature and other text data, such as screenplays


Languages

Language:R 100.0%