skvadrik / re2c

Lexer generator for C, C++, Go and Rust.

Home Page:https://re2c.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there a convenient way to get yytext?

krishna116 opened this issue · comments

I want to get the matched token string, for example:

/*!re2c
    number = [0-9]+ ;
    number { printf("number: %s\n", yytext); return 1; }
    *      { return 0; }
*/

the document seems doesn't provide this api or a convenient way to get the matched token string,
so what is the best way to get matched token string or I must using " @stag" ?
thank you.

If you need the whole matched text, then you don't need tags: the match begins at start position of YYCURSOR and ends at the final position of YYCURSOR. If you want to extract submatch in the middle of input, then you need tags. See the two examples below with and without tags (in your example they are not necessary).

There is no automatic yytext because re2c does not on its own allocate memory and create copies of the input text (this would be too expensive, as the user often doesn't need the copy). If you need a copy, you can easily create one as std::string s(x, y) where x and y are the pointers in the input text (see below).

Example without tags:

int lex(const char *str) {
    const char *YYCURSOR = str;

    /*!re2c
    re2c:define:YYCTYPE = char;
    re2c:yyfill:enable = 0;

    number = [0-9]+;
    number {
        // just print
        printf("number: %.*s\n", (int)(YYCURSOR - str), str);

        // save into an std::string
        std::string s(str, YYCURSOR);

        return 1;
    }
    * { return 0; }

    */
}

With tags:

int lex(const char *YYCURSOR) {
    const char *x, *y;
    /*!stags:re2c format = 'const char *@@;\n'; */

    /*!re2c
    re2c:define:YYCTYPE = char;
    re2c:yyfill:enable = 0;
    re2c:flags:tags = 1;

    number = [0-9]+;
    @x number @y {
        // just print
        printf("number: %.*s\n", (int)(y - x), x);

        // save into an std::string
        std::string s(x, y);

        return 1;
    }
    * { return 0; }

    */
}

Also, what document are you referring to? I don't think re2c docs mention yytext.

Also, what document are you referring to? I don't think re2c docs mention yytext.

Please ignore the question, I misread your initial comment as "the document does provide".

skvadrik, thank you very much, yet I am not very clear about YYCURSOR.
for example:

#include<iostream>

int lex(const char *str) {
    const char *YYCURSOR = str;
    const char *begin = nullptr;
    for(;;)
    {
        begin = YYCURSOR;
    /*!re2c
        re2c:define:YYCTYPE = char;
        re2c:yyfill:enable = 0;

        number = [0-9]+;
        spaces = [ \t]+;
        
        number { std::string s(begin, YYCURSOR); printf("token = [%s], size = %d\n",s.data(),s.size());}
        spaces { std::string s(begin, YYCURSOR); printf("token = [%s], size = %d\n",s.data(),s.size());}
        * { return -1; }
    */
    }
    
    return 0;
}

int main()
{
	std::string str{ "1234 456" };
	lex(str.data());
    return 0;
}

the result output is not full correct, the third token has error:
debug

You have to add continue; at the end of semantic actions (after printf). Otherwise the lexer just falls through into the next state, whatever it might be. Also s.c_str() to get the C string from an std::string is more conventional.

I have struggled with these problems for half a day, finally it is solved by your help,
thank you again and best wishes to you.