Simple command-line text-processing utilities in Python.
Table of contents:
Read lines from stdin. Replace Latin non-ASCII letters with closest equivalent ASCII letters using Python's unicodedata.decomposition() and a custom replacement table.
Print lines from stdin in a format suitable for caseless comparisons.
Print unique lines and their counts from stdin.
Search the Unix dictionary case-insensitively. For playing word games.
Words with characters other than A-Z or a-z won't be searched.
Arguments: WORD SOMEWHERE NOWHERE
WORD: a word with hyphens (-) in place of unknown letters; required
SOMEWHERE: letters that occur somewhere in the word; optional;
use hyphen (-) for none if you don't want to set this argument
but want to set the next argument
NOWHERE: letters that don't occur in the word; optional
Examples:
CH---- EO RS
will find six-letter words that start with CH and contain E and O
but no R or S (e.g. CHOICE).
---TH - EI
will find five-letter words that end with TH and don't contain E or I
(e.g. CLOTH)
Print lines from stdin in case-insensitive Finnish order. See Jukka Korpela: Nykyajan kielenopas – Aakkosjärjestys (in Finnish).
Group lines from stdin by prefix (LENGTH > 0), suffix (LENGTH < 0) or entire line (LENGTH = 0). Print number of lines in each group. Argument: LENGTH
Print setwise union (OPERATION=u), intersection (OPERATION=i) or difference (OPERATION=d) of lines without duplicates. Args: OPERATION FILE1 [FILE2 ...]
Romanize Russian (Cyrillic) text from stdin using the ISO 9 romanization. See Wikipedia: Venäjän translitterointi – ISO 9 (in Finnish).
Print unique lines from stdin.
Print codepoints, names and counts of unique characters from stdin.
Print words (sequences of Unicode Letter characters or non-initial non-final apostrophes) from stdin.