JuliaStrings / utf8proc

a clean C library for processing UTF-8 Unicode data

Home Page:http://juliastrings.github.io/utf8proc/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Any convenient way to detect if codepoint is an emoji?

dundargoc opened this issue · comments

I looked but couldn't find anything relevant.

It depends on your definition of "emoji". One option is to check whether utf8proc_get_property(codepoint)->boundclass == UTF8PROC_BOUNDCLASS_EXTENDED_PICTOGRAPHIC. See also this stackoverflow post.

It depends on your definition of "emoji".

I understand. Let me briefly summarize our usecase:

The neovim editor has recently added utf8proc as a dependency and we are actively trying to remove as much duplicate code as possible and instead rely on utf8proc wherever possible. Right now I believe neovim defines an "emoji" as any entry in emoji-data.txt that starts with the string "Emoji", meaning the rows with "Emoji", "Emoji_Presentation", "Emoji_Modifier_Base" etc. but not "Extended_Pictographic". I don't believe this distinction was intentional, but merely historical blunder. For our usecases we likely don't need to differentiate between "regular" emojis and extended pictographic; I suspect for the purposes of presentation we can include these as "emojis" as well.

I'll be honest, I'm still a beginner at unicode. If extended pictographic is a superset of "regular" emojis then that should work. I will research more and experiment with your suggestion and get back to you.

Thanks for letting us know, it's great to hear that utf8proc is useful for you.