allenwoods / wordsworth

Frequency analysis of letters, words and n-tuples.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wordsworth

Frequency analysis of letters, words and arbitrary-length n-tuples of words. Alt text ###Basic wordsworth: ####Example 1: Print the top 50 n-words in textfile.txt

$ python wordsworth --filename textfile.txt --top 50
$ python wordsworth -f textfile.txt -t 50

####Example 2: Print the top n-tuples of up to 10 words in textfile.txt

$ python wordsworth --filename textfile.txt --ntuple 10
$ python wordsworth -f textfile.txt -n 10

####Example 3: Ignore the words 'the', 'a' and '--'.

$ python wordsworth --filename textfile.txt --ignore the,a,--
$ python wordsworth -f textfile.txt -i the,a,--

####Example 4: Ignore just '--'.

$ python wordsworth --filename textfile.txt --ignore ,--
$ python wordsworth -f textfile.txt -i ,--

###NLTK-enabled wordsworth: wordsworth-nltk.py provides extended analysis, including a frequency analysis of verbs, nouns, adjectives, pronouns etc. To run this script you will need to install the python Natural Language Toolkit (NLTK) and the Brown dataset which is used for token tagging. Fortunately this is very simple to install.

Step 1. Install NLTK

$ sudo pip install nltk

Step 2. Launch the python interpretter

$ python

Step 3. Download the Brown dataset

>>> import nltk
>>> nltk.download('brown')
>>> nltk.download('punkt')

###Example output:

Alt text
Alt text
Alt text
Alt text
Alt text
Alt text

About

Frequency analysis of letters, words and n-tuples.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%