skvadrik / re2c

Lexer generator for C, C++, Go and Rust.

Home Page:https://re2c.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alternative for word boundaries expressions (e.g. `^`, `$` and `\b`)?

RadhiFadlillah opened this issue · comments

commented

Hi everyone.

As title said, I want to use re2go to convert the following regular expression to Go code:

(\D|^)(\d{8})(\D|$)

As you can see, that regex use non-numeric (\D) and word boundaries expressions (^ and $). Unfortunately, according to documentation those expressions are not supported by re2go. Fortunately non-numeric expressions can be easily replaced with ([^0-9]), however I don't have any idea on how to replace the boundary expressions.

With that said, any tips or advice on how to implement the word boundaries expression?

Thanks!

To express the start of input, you don't need to do anything special, just avoid the * { continue } rule that consumes any characters. So, (\D|^) would be something like [0-9]? at the beginning of a rule.

Expressing the end of input depends on your end-of-input handling method, but in the simple case if you are using the sentinel method you can just append the sentinel at the very end of your rule to denote the end of input.

So, I would say, your regexp translates to something like this in re2c:

    [^0-9]? [0-9]{8} [^0-9]? [\000] { /* ok */ }
    * { /* error */ }

You can also use the trailing context operator R / S to express that your rule R is followed by S, but S is not part of the consumed input (so, here you could make the sentinel [\000] trailing context, but that doesn't make much difference).

Closing, as there has been no activity here for a while. Please reopen if you have any further questions.