Incorrect `no-super-linear-move` report with a lookbehind assertion
bhsd-harry opened this issue · comments
Information:
- ESLint version: 8.30.0
eslint-plugin-regexp
version: 1.11.0
Description
The rule regexp/no-super-linear-move: "error"
reports the error below:
/(?<=^a*)b/;
// Any attack string /a+/ plus some rejecting suffix will cause quadratic runtime because of this quantifier
// regexp/no-super-linear-move
This does not seem correct, because there is already a ^
assertion.
Sorry for my misunderstanding. It is quite interesting that rewriting the regular expression as /^(a*)b/
will be a lot faster.
Yes, that's because the regex engine matches /(?<=^a*)b/
left to right and character by character.
Example: Given the string aaa
, it will start at position 0 and go into the lookbehind, which will match, only to find that there is no b
as position 0. At position 1, the lookbehind will match 1 a
, but there is still no b
, so on to the next position. At position 2, the lookbehind will match 2 a
s, but this still no b
, at position 3, the lookbehind will match 3 a
s, but still no b
. And then we have already reached the end of the string. We found no matches, so the string aaa
is rejected.
The key insight is that at each position, the regex engine must go through O(n) many a
s to match the lookbehind. Since there are n position in a string of length n, and each position takes O(n) time to match, the total runtime is O(n^2).
Unfortunately, browser regex engines don't do a lot of optimization on the pattern itself and will interpret it pretty much literally. In particular, this regex could have had linear runtime of the regex engine had been smart enough to see that checking for b
first ( O(1) ) and then going into the lookbehind ( O(n) ) is a lot faster.
But on the upside, the lack of such optimizations makes writing algorithms for detecting these worst cases a lot easier :)
Also, you solved the problem with a capturing group, which works great, but if you must use a lookbehind, you could also do this: /b(?<=^a*b)/
.
A bit hacky, but it does work. Note that this isn't a general workaround like capturing groups. This only works when both a
and b
are single characters (or character sets, or character classes), and a
and b
are disjoint (they are different characters/there is no character accepted by both).
So keep using capturing groups if you can. There are simply fewer surprises with them.
@RunDevelopment Thank you so much for your detailed explanation and a surprising solution!