egorcherkasoff / regex-nlp-text-analysis

Jupyter notebook, where I use nlp and practice regex

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analyzing "War and Peace, vol. 1" with Regular Expressions and NLTK

This Jupyter Notebook project uses Python's re (regular expression) and nltk (Natural Language Toolkit) packages to analyze Leo Tolstoy's classic novel, "War and Peace, vol. 1" in the original Russian language.

Getting Started

Before running this notebook on your local machine, you will need to clone this repository. You might also need to install the following packages:

  • nltk: for natural language processing pip install nltk

The notebook contains several code cells that analyze the text of the novel using regular expressions and NLTK.

Why I made this

This Jupyter Notebook project demonstrates how regular expressions and the NLTK package can be used to analyze the text of a classic novel in a foreign language. By using regular expressions to extract words and NLTK to process them, we were able to identify the most common words in the novel and remove common stopwords, as well as determine the overall mood of the book. But for the most part, I was just practicing with regex here.

About

Jupyter notebook, where I use nlp and practice regex


Languages

Language:Jupyter Notebook 100.0%