skvadrik / re2c

Lexer generator for C, C++, Go and Rust.

Home Page:https://re2c.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`--storable-state` + `re2c:yyfill:enable = 0`

rickbutton opened this issue · comments

I'm attempting to use the new Rust backend to implement a lexer, where the interface is Iterator. Because the interface for an iterator implies that lexing will stop until asked for another item, it makes sense to use --storable-state and store said state in a struct. For example, one could do something like this:

struct Lexer<'s> {
    s: &'s [u8],
    state: isize,
    cursor: usize,
    mark: usize,
    yych: u8,
    yyaccept: usize,
}

impl<'s> Lexer<'s> {
    pub fn new(s: &'s [u8]) -> Self {
        Lexer { s, state: -1, cursor: 0, mark: 0, yych: 0, yyaccept: 0 }
    }
}

impl<'s> Iterator for Lexer<'s> {
    type Item = Token;

    fn next(&mut self) -> Option<Self::Item> {
        /*!re2c
            re2c:define:YYCTYPE    = u8;
            re2c:define:YYGETSTATE = "self.state";
            re2c:define:YYSETSTATE = "self.state = @@;";
            re2c:define:YYPEEK     = "*self.s.get_unchecked(self.cursor)";
            re2c:define:YYSKIP     = "self.cursor += 1;";
            re2c:define:YYBACKUP   = "self.mark = self.cursor;";
            re2c:define:YYRESTORE  = "self.cursor = self.mark;";
            re2c:define:YYLESSTHAN = "self.cursor >= self.s.len()";
            re2c:variable:yych     = "self.yych";
            re2c:variable:yyaccept = "self.yyaccept";
            re2c:eof               = 0;
            re2c:yyfill:enable     = 0;

            WhiteSpace = (Zs | [\t\u000B\u000C\uFEFF])+;
            WhiteSpace { return token!(WhiteSpace); }
            LineTerminator = [\n] | "\r\n" | [\r\u2028\u2029];
            LineTerminator { return token!(LineTerminator); }
            MultiLineComment = "/" "*" ([^\x00*]|"*"[^\x00/])* "*"+ "/";
            MultiLineComment { return token!(MultiLineComment); }
            SingleLineComment = "//" [^\u0000\n\r\u2028\u2029]*;
            SingleLineComment { return token!(SingleLineComment); }

            * { return token!(Error); }
            $ { return None; }
         */
    }
}

Note the re2c:yyfill:enable = 0, because the full input is available, but when using these two features together, and error is thrown:

re2c: error: storablestate requires YYFILL to be enabled

I see that it was disabled in #306, but I wanted to express this use case since it seems like a natural way to write such a lexer in Rust.

Hi @rickbutton, you don't need storable state in this case, because you don't interrupt the lexer in the middle. Storable state is for the cases when the lexer can be interrupted in any state of the underlying automaton, and later it should resume from the point where it was interrupted. In your case the lexer runs until it reaches the final state, then it executes the corresponding semantic action and "voluntarily" returns to the caller. From the standpoint of the lexer it is finished. Next time you call it (for the next token) it will start from the initial state.

Note that it doesn't imply that you can't store lexer state in a struct (you can and you probably should). But you don't need the state variable.

Ah, that makes perfect sense. I misunderstood entirely. I appreciate the assistance!