Any convenient way to detect if codepoint is an emoji?

Question

Any convenient way to detect if codepoint is an emoji?

dundargoc opened this issue 25 days ago · comments

I looked but couldn't find anything relevant.

Steven G. Johnson · Answer 1 · Wed Jul 10 2024 19:45:51 GMT+0800 (China Standard Time)

It depends on your definition of "emoji". One option is to check whether utf8proc_get_property(codepoint)->boundclass == UTF8PROC_BOUNDCLASS_EXTENDED_PICTOGRAPHIC. See also this stackoverflow post.

dundargoc · Answer 2 · Wed Jul 10 2024 20:35:22 GMT+0800 (China Standard Time)

It depends on your definition of "emoji".

I understand. Let me briefly summarize our usecase:

The neovim editor has recently added utf8proc as a dependency and we are actively trying to remove as much duplicate code as possible and instead rely on utf8proc wherever possible. Right now I believe neovim defines an "emoji" as any entry in emoji-data.txt that starts with the string "Emoji", meaning the rows with "Emoji", "Emoji_Presentation", "Emoji_Modifier_Base" etc. but not "Extended_Pictographic". I don't believe this distinction was intentional, but merely historical blunder. For our usecases we likely don't need to differentiate between "regular" emojis and extended pictographic; I suspect for the purposes of presentation we can include these as "emojis" as well.

I'll be honest, I'm still a beginner at unicode. If extended pictographic is a superset of "regular" emojis then that should work. I will research more and experiment with your suggestion and get back to you.

Steven G. Johnson · Answer 3 · Thu Jul 11 2024 06:03:56 GMT+0800 (China Standard Time)

Thanks for letting us know, it's great to hear that utf8proc is useful for you.