k-takata / Onigmo

Onigmo is a regular expressions library forked from Oniguruma.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Captured strings of (.)(\g<1>) are same

scivola opened this issue · comments

Ruby code:

p /(.)(\g<1>)/.match("AB")
# => #<MatchData "AB" 1:"B" 2:"B">

I expected the following result:

# => #<MatchData "AB" 1:"A" 2:"B">

It's probably specification.

Extend doc about subexpression calls #105
https://github.com/k-takata/Onigmo/pull/105/files

Regex engine "oniguruma" do the same behavior.

p "ABXY".sub(/(.)\g<1>(.)\g<2>/,'[\1][\2]')
# result: "[B][Y]"

p "ABXY".sub(/(?<q>.)(?<q>.)(?<n>.)(?<n>.)/,'[\k<q>][\k<n>]')
# result: "[B][Y]"

p "ABXY".sub(/(.)\g<2>(.)\g<1>/,'[\1][\2]')
# result: "[Y][X]"

p "ABXY".sub(/(?<q>.)(?<n>.)(?<n>.)(?<q>.)/,'[\k<q>][\k<n>]')
# result: "[Y][X]"

https://github.com/kkos/oniguruma/blob/master/doc/RE
line 410

When backreferencing with a name that is assigned to more than one groups,
the last group with the name is checked first, if not matched then the
previous one with the name, and so on, until there is a match.

/(.)(\g<1>)/
eq
/(?<m>.)(?<t>(?<m>.))/

MatchData1: \k<m>    # ==B
MatchData2: \k<t>    # ==B

Named backrefs behave differently in Perl syntax #74

https://github.com/k-takata/Onigmo/blob/master/sample/simple.c
line:16,17,20 rewrote

Onigmo6.2.0 in ONIG_SYNTAX_PERL

"(.)((?1))\\k<1>"
"ABB"

[result]
match at 0
0: (0-3)
1: (1-2)
2: (1-2)
"(.)((?1))\\g{1}"
"ABB"

[result]
match at 0
0: (0-3)
1: (1-2)
2: (1-2)
"(.)((?1))\\k<1>"
"ABA"

[result] search fail
"(.)((?1))\\g{1}"
"ABA"

[result] search fail