sts10 / tidy

Combine and clean word lists

Home Page:https://sts10.github.io/2021/12/09/tidy-0-2-0.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Which Unicode normalization should Tidy use?

sts10 opened this issue · comments

It seems like there are four major "normalization forms" for Unicode:

The unicode_normalization crate I'm using supports all four. I somewhat arbitrarily have chosen NFC in #25 , but I'd like to make sure that's the best choice for Tidy users. I could also give users the option between all four!

What other similar projects use

So far I've only found one project that states explicitedly which form they use:

Welcome input from others! I'm new to this world!

Closing thanks to #27 being merged. The user can figure it out!