summer.py

A command-line tool to summarize text.

Given a body of text, summer attempts to find the most important sentences. The process involves finding the TF-IDF value for each term in the text, then creating a running total for each sentence. By default, the top five scoring sentences are printed.

NOTE: In an attempt to improve readability, the printed sentences are also sorted by their occurance in the original text.

Dependencies

python3-nltk

In addition to python3-nltk you need the punkt tokenizer they provide via thier custom downloader.

The following command installs the the tokenizer in the directory $HOME/nltk_data:

$ python3 nltk.downloader punkt

Use these commands to install the tokenizer as an administrator:

OS	command
Windows	`python -m nltk.downloader -d C:\nltk_data punkt`
Mac	`python -m nltk.downloader -d /usr/local/share/nltk_data punkt`
Unix	`python -m nltk.downloader -d /usr/share/nltk_data punkt`

Usage

summer.py [-h] [-n NUM] [-s STOP] [filename]

Argument	Description
filename	A file with text to summarize (Optional). Input is read from stdin if `filename` is ommitted.
-n NUM, --num NUM	The number of sentences to print (Optional). Default is 5; 0 prints all.
-s STOP, --stop STOP	A file with stopwords to load (Optional). If ommitted, all terms are processed.

About

GNU General Public License v3.0

Languages

Language:Python 100.0%