hunspell / hunspell

The most popular spellchecking library.

Home Page:http://hunspell.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

4 byte utf8 and 2 utf16 support required for some Unicode 15.0.0 areas

rovasiras opened this issue · comments

In the Unicode Standard 15.0.0 has two important area: U+10EC0 - U+10EFF arabic extended-C
U+1E030 - U+1E08F cyrillic extended-D

@caolanm Required for capability the following steps in the u8_u16 function:4 byte Utf8 code transform to utf32, then divide it two surrogate word. The u16_u8 function needs this mirrored method. You can found about the correct method in unicode faq "utf8 utf16 utf32".

A temporary and back-compatible solution could be to use ICONV and OCONV to convert the non-BMP characters e.g. to user-defined characters.