brynne8 / ccnorm

Lua Unicode normalization data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ccnorm

Lua Unicode normalization data. It's kind of similar to Skeleton algorithm from Unicode tr39, while it considers readability and cases.

Latin letters

Any unicode that looks similar to a latin letter is normalized to latin letters, even if it's a number or a punctuation. Characters are normalized by shape for latin letters, so Greek letter ν (lower case Nu) is normalized to latin letter V.

Chinese characters

Chinese characters (a.k.a kanji) are normalized to Simplified Chinese as much as possible. The normalized Chinese sentence should be readable by native Chinese people.

Contributing

The ccnorm.lua is automatically generated, so please report bugs in Issues. Do not send pull requests.

About

Lua Unicode normalization data

License:GNU General Public License v3.0


Languages

Language:Lua 100.0%