sts10 / tidy

Combine and clean word lists

Home Page:https://sts10.github.io/2021/12/09/tidy-0-2-0.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remove all `attributes`/ display information work, delegating it to new project: WLA

sts10 opened this issue · comments

Tidy's getting a bit large (4k lines!). So I'm thinking of moving all of the word list information/auditing over to a new project that I'm tentatively calling WLA.

My question: Should I now remove all this list attributes/information code from Tidy? Tidy users, thanks to the magic of Unix philosophy/piping, can simply pipe Tidy's output over to wla, which will perform the same as running tidy -AAAA:

tidy -D t eff.txt | wla

We could remove not only all the attribute printing code, but also the -G/-g options, that I bet are potentially confusing for users when comparing them to the -D/-d options!

Besides reducing codebase size, this also allows WLA to act more like a "true" auditor of word lists. One issue with having Tidy serve as both a word list creator and a word list auditing tool is that, if a list had duplicate or blank lines, Tidy would quietly remove these before printing attributes, which is kind of a lie. For example:

blue
green
red

blue

tidy -A counts this as 3 words, rather 4 or 5, since it automatically removes duplicates and blank lines before calculating attribute values, like list length.

In contrast, we can have wla count this as 5 "lines", then warn users that there are both blank and duplicate lines present.