Problems with ️FE0F character

Question

Problems with ️FE0F character

oriolbcn opened this issue 5 years ago · comments

It seems that the ️️FE0F character that some emojis have is not properly taken into account, which causes rendering problems in Safari because it leaves the FE0F character there when, in fact, it should be taken as part of the emoji.

Emojis are a critical part of our business, we recently migrated our old implementation to use this library and are even paying a licensce, and this a really big stopper for us right now, making it not feasible to use the library as we would provide a bad UX to users.

How to reproduce

Open the toImage demo page in Safari and try with the ❤️emoji. This is what happens:

What I have discovered so for

The problem comes from a combination of the unicode maps that are constructed + the contents of the emojiList object. Specifically these 2 lines:

https://github.com/emojione/emojione/blob/master/lib/js/emojione.js#L409

https://github.com/emojione/emojione/blob/master/lib/js/emojione.js#L400

This uses the uc_output property, however for all the emojis that have FE0F at the end, the uc_output is missing that part. For instance, the uc_output of ❤️is just 2764. Therefore, when converting U+2764 U+FE0F, it just replaces U+2764 and leaves the U+FE0F hanging.

I have noticed that the uc_match property has the proper combination (2764-FE0F), so I tried to used that, but then it fails for complex emojis like Family and Couple that use the 200D joiner. For those, the uc_output has the joiner, but the uc_match has not.

In summary:

uc_outupt has the 200D joiner character but not the FE0F characer. Fails with ❤️, succeeds with 👨‍👧.
uc_match has the FE0F character but not the 200D joiner. Succeeds with ❤️, fails with 👨‍👧.

So I think the underlying problem is in the way the emojiList object is built. I have tried to find the script that generates this but I can't find it, I guess you keep it in private.

Possible solutions I can think of

Best (more difficult)

The emoji.json file has an array of multiple matches for each emoji. I think the library should consider all these for a match. These include all variations with and without the FE0F and 200D characters, and maybe even other optional characters. This ensures that the emoji will be matched no matter how it is written.

Good

Standardize the value of uc_match and uc_output. Have always the full unicode sequence (with FE0F and 200D) in one property and without in the other property.

Hacky

I guess that if, with the current emojiList, we take both the uc_match and uc_output to build the maps (placing always the longest of the 2 first), it would work, but I have not tested it yet.

Casey A Henson · Answer 1 · Sat Feb 16 2019 07:18:38 GMT+0800 (China Standard Time)

@oriolbcn we've published a solution for this that I think works fairly well. Please try this out and let me know how it goes.

As you'll see, trailing VS16 characters will simply be removed after replacement has occurred which has proven successful in variations of this library that we've used in our applications. If you run into any issues with it we can revisit the alternatives.

Oriol Collell · Answer 2 · Mon Feb 18 2019 19:05:32 GMT+0800 (China Standard Time)

Tried it and works great 👍Don't love the solution though, to me the character should be taken as part of the Emoji replacement. But it does the job.