qalle2 / text-util

simple text-processing utilities

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text-util

Simple command-line text-processing utilities in Python.

Table of contents:

Line-oriented

asciify.py

Read lines from stdin. Replace Latin non-ASCII letters with closest equivalent ASCII letters using Python's unicodedata.decomposition() and a custom replacement table.

casefold.py

Print lines from stdin in a format suitable for caseless comparisons.

countlines.py

Print unique lines and their counts from stdin.

findword.py

Search the Unix dictionary case-insensitively. For playing word games.
Words with characters other than A-Z or a-z won't be searched.
Arguments: WORD SOMEWHERE NOWHERE
    WORD:      a word with hyphens (-) in place of unknown letters; required
    SOMEWHERE: letters that occur somewhere in the word; optional;
               use hyphen (-) for none if you don't want to set this argument
               but want to set the next argument
    NOWHERE:   letters that don't occur in the word; optional
Examples:
    CH---- EO RS
        will find six-letter words that start with CH and contain E and O
        but no R or S (e.g. CHOICE).
    ---TH - EI
        will find five-letter words that end with TH and don't contain E or I
        (e.g. CLOTH)

finsort.py

Print lines from stdin in case-insensitive Finnish order. See Jukka Korpela: Nykyajan kielenopas – Aakkosjärjestys (in Finnish).

grouplines.py

Group lines from stdin by prefix (LENGTH > 0), suffix (LENGTH < 0) or entire line (LENGTH = 0). Print number of lines in each group. Argument: LENGTH

lineset.py

Print setwise union (OPERATION=u), intersection (OPERATION=i) or difference (OPERATION=d) of lines without duplicates. Args: OPERATION FILE1 [FILE2 ...]

rus_iso9.py

Romanize Russian (Cyrillic) text from stdin using the ISO 9 romanization. See Wikipedia: Venäjän translitterointi – ISO 9 (in Finnish).

uniquelines.py

Print unique lines from stdin.

Other

countchars.py

Print codepoints, names and counts of unique characters from stdin.

getwords.py

Print words (sequences of Unicode Letter characters or non-initial non-final apostrophes) from stdin.

About

simple text-processing utilities

License:GNU General Public License v3.0


Languages

Language:Python 91.4%Language:Shell 8.6%