JuliaStrings / utf8proc

a clean C library for processing UTF-8 Unicode data

Home Page:http://juliastrings.github.io/utf8proc/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Undefined character listed as UTF8PROC_CATEGORY_LO

maartenbreddels opened this issue · comments

Hi all,

I found in apache/arrow#7656 that undefine characters (such as https://www.compart.com/en/unicode/U+08BE) are listed as UTF8PROC_CATEGORY_LO (using utf8proc_category). Could this be a bug?

Regards,

Maarten

U+08BE was defined in Unicode 13, and Lo is correct.

(It correctly returns UTF8PROC_CATEGORY_CN for currently unassigned codepoints like U+0378.)

You are correct, it didn't even cross my mind that Unicode changes that fast (apart from emoticons), thanks!