A command-line tool to summarize text.
Given a body of text, summer attempts to find the most important sentences. The process involves finding the TF-IDF value for each term in the text, then creating a running total for each sentence. By default, the top five scoring sentences are printed.
NOTE: In an attempt to improve readability, the printed sentences are also sorted by their occurance in the original text.
python3-nltk
In addition to python3-nltk
you need the punkt
tokenizer they provide via thier custom downloader.
The following command installs the the tokenizer in the directory $HOME/nltk_data:
$ python3 nltk.downloader punkt
Use these commands to install the tokenizer as an administrator:
OS | command |
---|---|
Windows | python -m nltk.downloader -d C:\nltk_data punkt |
Mac | python -m nltk.downloader -d /usr/local/share/nltk_data punkt |
Unix | python -m nltk.downloader -d /usr/share/nltk_data punkt |
summer.py [-h] [-n NUM] [-s STOP] [filename]
Argument | Description |
---|---|
filename | A file with text to summarize (Optional). Input is read from stdin if `filename` is ommitted. |
-n NUM, --num NUM | The number of sentences to print (Optional). Default is 5; 0 prints all. |
-s STOP, --stop STOP | A file with stopwords to load (Optional). If ommitted, all terms are processed. |