Outputs the 100 most common three-word sequences in the text, along with a count of how many times each occurred in the text. The program ignores punctuation, line endings, and is case insensitive.
The script accepts input from either stdin or as a list of arguments:
cat some-utf-8-encoded-textfile.txt | ./most_common_sequences.py
OR
./most_common_sequences.py some-file.txt another-file.txt
- Clone the repository:
$ git clone https://github.com/kaewarren/most-common-sequences.git
- Create and activate a virtual environment in the same directory:
$ pip install virtualenv
$ virtualenv env
$ source env/bin/activate
- Install the required packages using pip:
(env)$ pip install -r requirements.txt