symfony / polyfill

PHP polyfills

Home Page:https://symfony.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

grapheme_strlen shows different length of emoji ZWJ Sequence when compared to native

Luc45 opened this issue ยท comments

commented

Take the following emoji for instance: ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ

This emoji consists of four different emojis glued together by Zero Width Joiner characters, as seen on https://emojipedia.org/family-woman-woman-boy-boy/.

When checking the length with grapheme_strlen(), it returns 1, while this library returns 4.

This is possibly due to a bug on the GRAPHEME_CLUSTER_RX regex.

This bug should only happen on PCRE_VERSION < 8.32, however, when combined with the bug #369 , it applies to all PCRE_VERSION that contains a date timestamp, which seems to be the default format.

Therefore, the grapheme_strlen function in this polyfill is likely to provide incorrect results, such as in this example:

Expected result grapheme_strlen('๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ'):

The test is being conducted using the regex: \X

int(1)
int(1)
int(1)
int(1)

Actual result with the custom cluster grapheme_strlen('๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ'):

The test is being conducted using the regex: (?:\r\n|(?:[ -~\x{200C}\x{200D}]|[แ†จ-แ‡น]+|[แ„€-แ…Ÿ]*(?:[๊ฐ€๊ฐœ๊ฐธ๊ฑ”๊ฑฐ๊ฒŒ๊ฒจ๊ณ„๊ณ ๊ณผ๊ด˜๊ดด๊ต๊ตฌ๊ถˆ๊ถค๊ท€๊ทœ๊ทธ๊ธ”๊ธฐ๊นŒ๊นจ๊บ„๊บ ๊บผ๊ป˜๊ปด๊ผ๊ผฌ๊ฝˆ๊ฝค๊พ€๊พœ๊พธ๊ฟ”๊ฟฐ๋€Œ๋€จ๋„๋ ๋ผ๋‚˜๋‚ด๋ƒ๋ƒฌ๋„ˆ๋„ค๋…€๋…œ๋…ธ๋†”๋†ฐ๋‡Œ๋‡จ๋ˆ„๋ˆ ๋ˆผ๋‰˜๋‰ด๋Š๋Šฌ๋‹ˆ๋‹ค๋Œ€๋Œœ๋Œธ๋”๋ฐ๋ŽŒ๋Žจ๋„๋ ๋ผ๋˜๋ด๋‘๋‘ฌ๋’ˆ๋’ค๋“€๋“œ๋“ธ๋””๋”ฐ๋•Œ๋•จ๋–„๋– ๋–ผ๋—˜๋—ด๋˜๋˜ฌ๋™ˆ๋™ค๋š€๋šœ๋šธ๋›”๋›ฐ๋œŒ๋œจ๋„๋ ๋ผ๋ž˜๋žด๋Ÿ๋Ÿฌ๋ ˆ๋ ค๋ก€๋กœ๋กธ๋ข”๋ขฐ๋ฃŒ๋ฃจ๋ค„๋ค ๋คผ๋ฅ˜๋ฅด๋ฆ๋ฆฌ๋งˆ๋งค๋จ€๋จœ๋จธ๋ฉ”๋ฉฐ๋ชŒ๋ชจ๋ซ„๋ซ ๋ซผ๋ฌ˜๋ฌด๋ญ๋ญฌ๋ฎˆ๋ฎค๋ฏ€๋ฏœ๋ฏธ๋ฐ”๋ฐฐ๋ฑŒ๋ฑจ๋ฒ„๋ฒ ๋ฒผ๋ณ˜๋ณด๋ด๋ดฌ๋ตˆ๋ตค๋ถ€๋ถœ๋ถธ๋ท”๋ทฐ๋ธŒ๋ธจ๋น„๋น ๋นผ๋บ˜๋บด๋ป๋ปฌ๋ผˆ๋ผค๋ฝ€๋ฝœ๋ฝธ๋พ”๋พฐ๋ฟŒ๋ฟจ์€„์€ ์€ผ์˜์ด์‚์‚ฌ์ƒˆ์ƒค์„€์„œ์„ธ์…”์…ฐ์†Œ์†จ์‡„์‡ ์‡ผ์ˆ˜์ˆด์‰์‰ฌ์Šˆ์Šค์‹€์‹œ์‹ธ์Œ”์Œฐ์Œ์จ์Ž„์Ž ์Žผ์˜์ด์์ฌ์‘ˆ์‘ค์’€์’œ์’ธ์“”์“ฐ์”Œ์”จ์•„์• ์•ผ์–˜์–ด์—์—ฌ์˜ˆ์˜ค์™€์™œ์™ธ์š”์šฐ์›Œ์›จ์œ„์œ ์œผ์˜์ด์ž์žฌ์Ÿˆ์Ÿค์ €์ œ์ ธ์ก”์กฐ์ขŒ์ขจ์ฃ„์ฃ ์ฃผ์ค˜์คด์ฅ์ฅฌ์ฆˆ์ฆค์ง€์งœ์งธ์จ”์จฐ์ฉŒ์ฉจ์ช„์ช ์ชผ์ซ˜์ซด์ฌ์ฌฌ์ญˆ์ญค์ฎ€์ฎœ์ฎธ์ฏ”์ฏฐ์ฐŒ์ฐจ์ฑ„์ฑ ์ฑผ์ฒ˜์ฒด์ณ์ณฌ์ดˆ์ดค์ต€์ตœ์ตธ์ถ”์ถฐ์ทŒ์ทจ์ธ„์ธ ์ธผ์น˜์นด์บ์บฌ์ปˆ์ปค์ผ€์ผœ์ผธ์ฝ”์ฝฐ์พŒ์พจ์ฟ„์ฟ ์ฟผํ€˜ํ€ดํํฌํ‚ˆํ‚คํƒ€ํƒœํƒธํ„”ํ„ฐํ…Œํ…จํ†„ํ† ํ†ผํ‡˜ํ‡ดํˆํˆฌํ‰ˆํ‰คํŠ€ํŠœํŠธํ‹”ํ‹ฐํŒŒํŒจํ„ํ ํผํŽ˜ํŽดํํฌํˆํคํ‘€ํ‘œํ‘ธํ’”ํ’ฐํ“Œํ“จํ”„ํ” ํ”ผํ•˜ํ•ดํ–ํ–ฌํ—ˆํ—คํ˜€ํ˜œํ˜ธํ™”ํ™ฐํšŒํšจํ›„ํ› ํ›ผํœ˜ํœดํํฌํžˆ]?[แ… -แ†ข]+|[๊ฐ€-ํžฃ])[แ†จ-แ‡น]*|[แ„€-แ…Ÿ]+|[^\p{Cc}\p{Cf}\p{Zl}\p{Zp}])[\p{Mn}\p{Me}\x{09BE}\x{09D7}\x{0B3E}\x{0B57}\x{0BBE}\x{0BD7}\x{0CC2}\x{0CD5}\x{0CD6}\x{0D3E}\x{0D57}\x{0DCF}\x{0DDF}\x{200C}\x{200D}\x{1D165}\x{1D16E}-\x{1D172}]*|[\p{Cc}\p{Cf}\p{Zl}\p{Zp}])

int(1)
int(4)
int(1)
int(4)
commented

I forgot to share the code snippet used on the results above: https://3v4l.org/OPBFq#v8.0.10

Would you agree with considering that once #369 is merged, this issue can be closed? Aka we don't provide the most recent regexp to ppl that use older PCRE versions?

Alternatively, would you mind looking at improving this regexp? I'm sure I generated it but I don't remember how. There might be a script somewhere in this repo or mayne in https://github.com/tchwork/utf8

commented

Thanks for asking my input.

This package requires PHP 7.1, which seems to use PCRE 8.38 according to 3v4l.org: https://3v4l.org/S1bPl

On the PHP versions made available by 3v4l, 8.32 is used on PHP versions bellow 5.5.9, but I'm not sure if this will always be the case.

Is it possible for PHP 7.1+ to be running PCRE 8.32..?

commented

It seems PCRE 8.32 made it's way into PHP core in 2013: php/php-src@357ab3c

And has been replaced with 8.35 in 2014: php/php-src@dd0e96c

I guess it's fine to drop support for the old PCRE_VERSION. It would be ideal if this could be enforced in composer.json through ext-pcre, but given the non-standard version number of PCRE, it can be challenging to enforce the versions.

https://jubianchi.github.io/semver-check/#/^10%20||%20^8.34/8.34%202013-12-15

Or "ext-pcre": "> 8.32":

https://jubianchi.github.io/semver-check/#/%3E%208.32/8.34%202013-12-15

commented

It seems that PHP 7.1.0 requires PCRE > 6.6 to compile

Only on PHP 7.3 the version restriction was increased to PCRE > 10.30

These restrictions refer only to compilling PHP with an external PCRE.

commented

Actually, only PCRE2 (10+) is able to handle the initial grapheme_strlen example correctly: https://3v4l.org/grqP9

I'm going to close here because nobody worked on this. Ppl should upgrade to PCRE 10+ (or contribute a fix here ;) )