rurban / re-engine-PCRE2

use pcre-jit instead of slow perl regex

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

t/perl/regexp.t fails with PCRE2 10.32-RC1

ppisar opened this issue · comments

PCRE2 has a release candidate for 10.32 and these t/perl/regexp.t tests fail with it:

$ perl -Iblib/{arch,lib} t/perl/regexp.t 1 t/perl/re_tests 1443 1444 1960 1961 
1..4
# 1 iterations
not ok 1 () /\N{U+41}\x{c1}/i:a\x{e1}:y:$&:a\x{e1} => `/', match=
$subject = "a\341";

$got = "/";

                ;
                $match = ($subject =~ m/\N{U+41}\x{c1}/i) while $c--;
                $got = "$&";

not ok 2 () /[\N{U+41}\x{c1}]/i:\x{e1}:y:$&:\x{e1} => `/', match=
$subject = "\341";

$got = "/";

                ;
                $match = ($subject =~ m/[\N{U+41}\x{c1}]/i) while $c--;
                $got = "$&";

not ok 3 () foo(*ACCEPT:foo):foo:y:$::REGMARK:foo => `', match=1
$subject = "foo";

$got = "";

                ;
                $match = ($subject =~ m'foo(*ACCEPT:foo)') while $c--;
                $got = "$::REGMARK";

not ok 4 () (foo(*ACCEPT:foo)):foo:y:$::REGMARK:foo => `', match=1
$subject = "foo";

$got = "";

                ;
                $match = ($subject =~ m'(foo(*ACCEPT:foo))') while $c--;
                $got = "$::REGMARK";

This may be caused by these new PCRE2 features:

27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.

29. Add support for \N{U+dddd}, but not in EBCDIC environments.

I confirm that the failures are triggered with these new features introduced with PCRE2 commits:

commit 1ad8a5e6add80b53753a4b78589ff41fc58dad18
Author: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>
Date:   Sat Jul 21 14:34:51 2018 +0000

    Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)
    followed by (*ACCEPT) in an assertion. More small updates to perltest.sh.
    
    
    git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@968 6239d852-aaf2-0410-a92c-79f79f948069

and

commit f0921f962e383718a302729151ee21860b419d79
Author: ph10 <ph10@6239d852-aaf2-0410-a92c-79f79f948069>
Date:   Fri Jul 27 16:30:40 2018 +0000

    Add support for \N{U+dd...}, for ASCII and Unicode modes only.
    
    
    git-svn-id: svn://vcs.exim.org/pcre2/code/trunk@972 6239d852-aaf2-0410-a92c-79f79f948069

Thanks, confirmed

I've filed 2 PCRE2 bugs: https://bugs.exim.org/show_bug.cgi?id=2306
https://bugs.exim.org/show_bug.cgi?id=2305 for these.
2305 clearly a pcre2 regression, 2306 looks also like a pcre2 bug to me.

Added the specializations to the testcases, where pcre2 deviates from perl5 for the upcoming 0.15 release

fixup for libpcre2 >= 10.32 unicode semantic changes:

  • Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)
    followed by (*ACCEPT) in an assertion.
  • Add support for \N{U+dd...}, for ASCII and Unicode modes only.
    Caused unicode regression https://bugs.exim.org/show_bug.cgi?id=2305
    (need to observe unicode folding rules for \N{U+NNNN} chars)