Most Common Sequences Finder Thingy

Crash Course

Outputs the 100 most common three-word sequences in the text, along with a count of how many times each occurred in the text. The program ignores punctuation, line endings, and is case insensitive.

Usage

The script accepts input from either stdin or as a list of arguments:

cat some-utf-8-encoded-textfile.txt | ./most_common_sequences.py

OR

./most_common_sequences.py some-file.txt another-file.txt

Environment

Clone the repository:

$ git clone https://github.com/kaewarren/most-common-sequences.git

Create and activate a virtual environment in the same directory:

$ pip install virtualenv
$ virtualenv env
$ source env/bin/activate

Install the required packages using pip:

(env)$ pip install -r requirements.txt

About

Outputs a list of the 100 most common three-word sequences in the text, along with a count of how many times each occurred in the text. The program ignores punctuation, line endings, and is case insensitive.

Languages

Language:Python 100.0%