orangeduck / mpc

A Parser Combinator library for C

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

implementing regex lookahead

mgood7123 opened this issue · comments

to implement regex lookahead i need to

  1. obtain the text captured in mpc_parens and Regex

  2. look for the following at the start of the text:

    ?! (negate assertion, eg [^a])
    ?: (assertion, eg [a])

  3. save a copy of the current active mpc_input_t structure ( can be done via

	mpc_input_t ii = *i;
	// do stuff
	*i = ii; // original input state is restored

) as using mpc_input_rewind will segmentation fault or probably fail in the case of a multi-step parser due to one needing to execute mpc_input_rewind for each parser succession where the input is modified
4. if it is a lookahead shift the string by 2 otherwise leave as is
5. execute it as mpc_re_mode with the parser as Regex and mode as mode, retaining the original mode and regex parser, in order to work recursively
6. if lookahead, restore the current active mpc_input_t structure with the one that was saved in step 3
7. fail or succeed depending on the lookahead mode (?! or ?:)

the problem i am having is trying to obtain the text captured from the Regex parser needed in order to look for ?! or ?: at the start of the text inside the parenthesis

for example, given (?:abc(?!e)d)

`(?:abc(?!e)d)` > `?:abc(?!e)d` {advance 2} > `abc(?!e)d` {save state}
    > `abc` {check if match returns true}
        > if false
            > {restore last saved state}
            > lookahead fails
        > if true
            > `(?!e)` > `?!e` {advance 2} > `e` {save state}
                > `e` { check if match returns false}
            > {restore last saved state}
                > if true
                    > {restore last saved state}
                    > lookahead fails
                > if false
                    > `d` { check if match returns true}
                        > if true
                            > {restore last saved state}
                            > lookahead succeeds
                        > if false
                            > {restore last saved state}
                            > lookahead fails

I think it is going to be too difficult to add lookahead because I make lots of assumptions when parsing the regex that there will be no lookahead (for example I disable backtracking). If I was you, for this I would just go with a standard regex library such as pcre (https://www.pcre.org/).

True but isnt that unportable? mpc is meant to be fully portable right?

Depends what platforms you are interested in. I imagine it will work okay on most platforms but you'd just have to test.

Hyperscan and Re/flex (C++) are much faster than pcre1/2 without rarely-used features. pcre2-jit is quite fast.