kaesackett / most-common-sequences

Outputs a list of the 100 most common three-word sequences in the text, along with a count of how many times each occurred in the text. The program ignores punctuation, line endings, and is case insensitive.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Most Common Sequences Finder Thingy

Crash Course

Outputs the 100 most common three-word sequences in the text, along with a count of how many times each occurred in the text. The program ignores punctuation, line endings, and is case insensitive.

Usage

The script accepts input from either stdin or as a list of arguments:

cat some-utf-8-encoded-textfile.txt | ./most_common_sequences.py

OR

./most_common_sequences.py some-file.txt another-file.txt

Environment

  1. Clone the repository:
$ git clone https://github.com/kaewarren/most-common-sequences.git
  1. Create and activate a virtual environment in the same directory:
$ pip install virtualenv
$ virtualenv env
$ source env/bin/activate 
  1. Install the required packages using pip:
(env)$ pip install -r requirements.txt

About

Outputs a list of the 100 most common three-word sequences in the text, along with a count of how many times each occurred in the text. The program ignores punctuation, line endings, and is case insensitive.


Languages

Language:Python 100.0%