Is there a convenient way to get yytext?

Question

Is there a convenient way to get yytext?

krishna116 opened this issue 3 years ago · comments

I want to get the matched token string, for example:

/*!re2c
    number = [0-9]+ ;
    number { printf("number: %s\n", yytext); return 1; }
    *      { return 0; }
*/

the document seems doesn't provide this api or a convenient way to get the matched token string,
so what is the best way to get matched token string or I must using " @stag" ?
thank you.

Ulya Trofimovich · Answer 1 · Fri Dec 10 2021 19:43:38 GMT+0800 (China Standard Time)

If you need the whole matched text, then you don't need tags: the match begins at start position of YYCURSOR and ends at the final position of YYCURSOR. If you want to extract submatch in the middle of input, then you need tags. See the two examples below with and without tags (in your example they are not necessary).

There is no automatic yytext because re2c does not on its own allocate memory and create copies of the input text (this would be too expensive, as the user often doesn't need the copy). If you need a copy, you can easily create one as std::string s(x, y) where x and y are the pointers in the input text (see below).

Example without tags:

int lex(const char *str) {
    const char *YYCURSOR = str;

    /*!re2c
    re2c:define:YYCTYPE = char;
    re2c:yyfill:enable = 0;

    number = [0-9]+;
    number {
        // just print
        printf("number: %.*s\n", (int)(YYCURSOR - str), str);

        // save into an std::string
        std::string s(str, YYCURSOR);

        return 1;
    }
    * { return 0; }

    */
}

With tags:

int lex(const char *YYCURSOR) {
    const char *x, *y;
    /*!stags:re2c format = 'const char *@@;\n'; */

    /*!re2c
    re2c:define:YYCTYPE = char;
    re2c:yyfill:enable = 0;
    re2c:flags:tags = 1;

    number = [0-9]+;
    @x number @y {
        // just print
        printf("number: %.*s\n", (int)(y - x), x);

        // save into an std::string
        std::string s(x, y);

        return 1;
    }
    * { return 0; }

    */
}

Also, what document are you referring to? I don't think re2c docs mention yytext.

Ulya Trofimovich · Answer 2 · Fri Dec 10 2021 19:46:43 GMT+0800 (China Standard Time)

Also, what document are you referring to? I don't think re2c docs mention yytext.

Please ignore the question, I misread your initial comment as "the document does provide".

Krishna · Answer 3 · Fri Dec 10 2021 20:49:54 GMT+0800 (China Standard Time)

skvadrik, thank you very much, yet I am not very clear about YYCURSOR.
for example:

#include<iostream>

int lex(const char *str) {
    const char *YYCURSOR = str;
    const char *begin = nullptr;
    for(;;)
    {
        begin = YYCURSOR;
    /*!re2c
        re2c:define:YYCTYPE = char;
        re2c:yyfill:enable = 0;

        number = [0-9]+;
        spaces = [ \t]+;
        
        number { std::string s(begin, YYCURSOR); printf("token = [%s], size = %d\n",s.data(),s.size());}
        spaces { std::string s(begin, YYCURSOR); printf("token = [%s], size = %d\n",s.data(),s.size());}
        * { return -1; }
    */
    }
    
    return 0;
}

int main()
{
	std::string str{ "1234 456" };
	lex(str.data());
    return 0;
}

the result output is not full correct, the third token has error:

Ulya Trofimovich · Answer 4 · Fri Dec 10 2021 20:54:24 GMT+0800 (China Standard Time)

You have to add continue; at the end of semantic actions (after printf). Otherwise the lexer just falls through into the next state, whatever it might be. Also s.c_str() to get the C string from an std::string is more conventional.

Krishna · Answer 5 · Fri Dec 10 2021 21:12:03 GMT+0800 (China Standard Time)

I have struggled with these problems for half a day, finally it is solved by your help,
thank you again and best wishes to you.