Undefined character listed as UTF8PROC_CATEGORY_LO
maartenbreddels opened this issue · comments
Hi all,
I found in apache/arrow#7656 that undefine characters (such as https://www.compart.com/en/unicode/U+08BE) are listed as UTF8PROC_CATEGORY_LO (using utf8proc_category
). Could this be a bug?
Regards,
Maarten
U+08BE was defined in Unicode 13, and Lo is correct.
(It correctly returns UTF8PROC_CATEGORY_CN
for currently unassigned codepoints like U+0378.)
You are correct, it didn't even cross my mind that Unicode changes that fast (apart from emoticons), thanks!