sts10 / tidy

Combine and clean word lists

Home Page:https://sts10.github.io/2021/12/09/tidy-0-2-0.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Print statistics and attributes of both inputted and new lists

sts10 opened this issue · comments

It could either be on or off by default, not sure yet. Things to include:

  • list length
  • entropy per word
  • shortest word length
  • longest word length
  • minimum edit distance
  • minimum unique prefix
  • is over brute force line for English lowercase
  • if free of prefix codes

Maybe using a table-printing crate like prettytable-rs or comfy-table.

I've now calculated all but one of the desired statistics in a function in main.rs. Rather than use a table crate, I'm just printing it as nicely as I can by hand. Example:

Attributes of new list
----------------------
List length            : 5133
Entropy per word       : 12.3256
Length of shortest word: 3 (aim)
Length of longest word : 9 (zoologist)
Free of prefix words   : true
Above brute force line : true
Shortest edit distance : 2

Done! Think I got the "Unique character prefix" statistic correct.

Could still explore using a crate to make a nicer table.