borowski / Wordlint

Wordlint: Plaintext redundancy linter written in Haskell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wordlint 0.2.0.4: a plaintext redundancy linter written in Haskell

#Description

Wordlint locates matching pairs of words repeated within a user-defined distance. Text may be linted by distance between words (that is, by word count), by line count, and/or by percentage of the total words in the file. Multiple lint types may be specified at one time. The user may also choose a minimum word length for matches.

Filters are available to remove punctuation, capitalization, and/or a user-defined list of words from the list of potential matches.

Various modes exist for data output, which is machine-readable by default with column-based formatting. Results may be sorted by alphabetically by word, by position (line number), or by intervening distance between matches; and may be used with a human-readable mode. Additionally, an "error" mode may supersede these options to provide output designed for easy integration with text editors.

#Installation

Following haskell convention, run

cabal update && cabal install wordlint

to install via Hackage.

To build locally, clone this repository, cd to it, and execute:

cabal update && cabal install

Afterward, ensure the binary wordlint is available in your system's $PATH. A man page is also available and may be copied to the user's .cabal directory:

cp man/man1/wordlint.1 ~/.cabal/share/man/man1/wordlint.1

#Options

--help

Display condensed help and exit.

-f, --file FILE

Specify an input file. If none is given, wordlint reads from stdin.

##Linting Options

-w, --words INT

Specify maximum intervening distance between returned word-pairs
measuring by word count. This may intersect with the --lines and
--percent options, but is ignored if -a is provided. Default is 250.

-l, --distance INT

Specify maximum intervening distance between returned word-pairs
measuring by line count. This may intersect with the --words and
--percent options, but is ignored if -a is provided. Default is 0 (off).

-p, --percent DOUBLE

Specify maximum intervening distance between returned word-pairs
measuring by percentage of words. This may intersect with the --words and
--lines options, but is ignored if -a is provided. Default is 0 (off).

-m ,--matchlength NUMBER

Specify minimum length of words to be matched, i.e. to reduce hits for "the".
Default is 5.

##Filters

-b, --blacklist

Specify a file containing a newline-separated list of words (no spaces) to
filter from matches. Pairs well with --nopunct, which is applied before, but 
activated prior to application of --nocaps filter. Thus, --nocaps will not
interfere, for example, with proper names given in the blacklist.

--nocaps

Ignore capitalization when determining matches.

--nopunct

Ignore punctuation when determining matches.

##Output Options

-a, --all

Return all matched pairs of words regardless of intervening distance. Deactivates -d parameter.

-h, --human

Return human-readable output. Compatible with all sorting except for 
`--show vim`, which will supersede `--human`.

-s, --sort word|position|distance|vim

Sort word pairs alphabetically, by line number, or by intervening
distance; or provides output designed for error checking in text
editors---respective to the following options:

    - word
    - position (default)
    - distance
    - error

#Examples

wordlint --file file.txt

Runs the default check: a word-based check on words of five or more characters. The distance between each match is to be no more than 250 words. The results are in a machine-readable table format (i.e. for easy use with awk, sed, and the like).

wordlint --lines 20 --matchlength 7 --file file.txt

Finds matching strings consisting of seven characters or more and which have an intervening distance of twenty lines or less. Returns machine-readable format.

wordlint -w 100 -l 20 -m 7 -f file.txt

Finds matching strings consisting of seven characters or more and which fall within an intervening distance of both 100 words and twenty lines or less. Returns machine-readable format.

cat file.txt | wordlint --percent 2.5 -a -s word -h

Finds all matching, five-characters-or-longer strings within a 2.5% distance of one-another within the file, and returns the output sorted alphabetically and in "human-readable form.

wordlint -f file.txt -b dir/blacklist.txt --nopunct --nocaps -s error

Finds matching strings consisting of 5 characters or more, and which have had punctuation, a list of words, and all capitalization stripped from the possible matches. Returns output designed for use in text editors (i.e. Vim's 'erororformat' option).

#See Also

A Vim front-end to Wordlint, creatively named Wordlint.vim, is available at https://github.com/gbgar/Wordlint.vim

A compiler or flycheck interface are available for Emacs: https://github.com/gbgar/wordlint.el https://github.com/gbgar/flycheck-wordlint.el

About

Wordlint: Plaintext redundancy linter written in Haskell

License:Other


Languages

Language:Haskell 100.0%