Incorrect `no-super-linear-move` report with a lookbehind assertion

Question

Incorrect `no-super-linear-move` report with a lookbehind assertion

bhsd-harry opened this issue 2 years ago · comments

Bhsd commented 2 years ago

Information:

ESLint version: 8.30.0
eslint-plugin-regexp version: 1.11.0

Description

The rule regexp/no-super-linear-move: "error" reports the error below:

/(?<=^a*)b/;
// Any attack string /a+/ plus some rejecting suffix will cause quadratic runtime because of this quantifier
// regexp/no-super-linear-move

This does not seem correct, because there is already a ^ assertion.

Bhsd · Answer 1 · Wed Dec 28 2022 17:18:36 GMT+0800 (China Standard Time)

Sorry for my misunderstanding. It is quite interesting that rewriting the regular expression as /^(a*)b/ will be a lot faster.

Michael Schmidt · Answer 2 · Wed Dec 28 2022 18:14:50 GMT+0800 (China Standard Time)

Yes, that's because the regex engine matches /(?<=^a*)b/ left to right and character by character.

Example: Given the string aaa, it will start at position 0 and go into the lookbehind, which will match, only to find that there is no b as position 0. At position 1, the lookbehind will match 1 a, but there is still no b, so on to the next position. At position 2, the lookbehind will match 2 as, but this still no b, at position 3, the lookbehind will match 3 as, but still no b. And then we have already reached the end of the string. We found no matches, so the string aaa is rejected.

The key insight is that at each position, the regex engine must go through O(n) many as to match the lookbehind. Since there are n position in a string of length n, and each position takes O(n) time to match, the total runtime is O(n^2).

Unfortunately, browser regex engines don't do a lot of optimization on the pattern itself and will interpret it pretty much literally. In particular, this regex could have had linear runtime of the regex engine had been smart enough to see that checking for b first ( O(1) ) and then going into the lookbehind ( O(n) ) is a lot faster.
But on the upside, the lack of such optimizations makes writing algorithms for detecting these worst cases a lot easier :)

Also, you solved the problem with a capturing group, which works great, but if you must use a lookbehind, you could also do this: /b(?<=^a*b)/.
A bit hacky, but it does work. Note that this isn't a general workaround like capturing groups. This only works when both a and b are single characters (or character sets, or character classes), and a and b are disjoint (they are different characters/there is no character accepted by both).

So keep using capturing groups if you can. There are simply fewer surprises with them.

Bhsd · Answer 3 · Wed Dec 28 2022 18:44:14 GMT+0800 (China Standard Time)

@RunDevelopment Thank you so much for your detailed explanation and a surprising solution!