Which Unicode normalization should Tidy use?
sts10 opened this issue · comments
Sam Schlinkert commented
It seems like there are four major "normalization forms" for Unicode:
The unicode_normalization crate I'm using supports all four. I somewhat arbitrarily have chosen NFC in #25 , but I'd like to make sure that's the best choice for Tidy users. I could also give users the option between all four!
What other similar projects use
So far I've only found one project that states explicitedly which form they use:
- Bitcoin's bips project uses NFKD everywhere: "The wordlist can contain native characters, but they must be encoded in UTF-8 using Normalization Form Compatibility Decomposition (NFKD)."
Welcome input from others! I'm new to this world!
Sam Schlinkert commented
Closing thanks to #27 being merged. The user can figure it out!