Flowers for Algernon
Natural Language Processing and textual analysis on the novel Flowers for Algernon.
Introduction
Flowers for Algernon is a 1966 novel written by Daniel Keyes that tells the story of Charlie Gordon, a 32-year-old man with an IQ of 68. Charlie undergoes an operation to increase his intelligence, and the operation is a success, with his IQ eventually reaching 185. However, the effects are temporarily, and Charlie begins to regress to his original state.[1]
Charlie's progression throughout the novel is analyzed using natural language processing.
Tableau Dashboard
Find the Tableau Dashboard here.
Methodology
- Words were tokenized using NLTK's
word_tokenize
function and using a custom regex tokenization. - Misspellings were checked using the Brown corpus, Words corpus, and a manually generated list of valid words. Overall, this contained 261822 'valid' words.
- Sentiment Analysis was completed using a Lexicon based approach by referencing the NRC Emotion Leixcon.
- Various reading scores such as the Flesch-Kincaid readability tests were calculated using the Python package
textstat
. - CSV's were exported to build the visualization in Tableau.
See Jupyter Notebooks for full details.
Data
Data is taken from the novel under fair use for the purpose of education and commentary uses only.
Exported data used to build the Tableau visualization can be found here.
References, Resources, and Inspirations
- [1] https://en.wikipedia.org/wiki/Flowers_for_Algernon
- Python Packages: Pandas, NumPy, NLTK, SpellChecker, TextStat
- NRC Emotion Leixcon
- Tableau Dashboard Inspirations: The Eponymous Phrase, Sentiment Analysis w/ Quentin Tarantino
Extensions and To-do
- Finish write-up
- Remove unnecessary data (to speed up Tableau)
- Generate random text using a trigram (or other n-gram) model
Author
Alex Chung