implementing regex lookahead
mgood7123 opened this issue · comments
to implement regex lookahead i need to
-
obtain the text captured in
mpc_parens
andRegex
-
look for the following at the start of the text:
?!
(negate assertion, eg [^a])
?:
(assertion, eg [a]) -
save a copy of the current active
mpc_input_t
structure ( can be done via
mpc_input_t ii = *i;
// do stuff
*i = ii; // original input state is restored
) as using mpc_input_rewind
will segmentation fault or probably fail in the case of a multi-step parser due to one needing to execute mpc_input_rewind
for each parser succession where the input is modified
4. if it is a lookahead shift the string by 2 otherwise leave as is
5. execute it as mpc_re_mode
with the parser as Regex
and mode as mode
, retaining the original mode and regex parser, in order to work recursively
6. if lookahead, restore the current active mpc_input_t
structure with the one that was saved in step 3
7. fail or succeed depending on the lookahead mode (?!
or ?:
)
the problem i am having is trying to obtain the text captured from the Regex
parser needed in order to look for ?!
or ?:
at the start of the text inside the parenthesis
for example, given (?:abc(?!e)d)
`(?:abc(?!e)d)` > `?:abc(?!e)d` {advance 2} > `abc(?!e)d` {save state}
> `abc` {check if match returns true}
> if false
> {restore last saved state}
> lookahead fails
> if true
> `(?!e)` > `?!e` {advance 2} > `e` {save state}
> `e` { check if match returns false}
> {restore last saved state}
> if true
> {restore last saved state}
> lookahead fails
> if false
> `d` { check if match returns true}
> if true
> {restore last saved state}
> lookahead succeeds
> if false
> {restore last saved state}
> lookahead fails
I think it is going to be too difficult to add lookahead because I make lots of assumptions when parsing the regex that there will be no lookahead (for example I disable backtracking). If I was you, for this I would just go with a standard regex library such as pcre (https://www.pcre.org/).
True but isnt that unportable? mpc is meant to be fully portable right?
Depends what platforms you are interested in. I imagine it will work okay on most platforms but you'd just have to test.
Hyperscan and Re/flex (C++) are much faster than pcre1/2 without rarely-used features. pcre2-jit is quite fast.