skvadrik / re2c

Lexer generator for C, C++, Go and Rust.

Home Page:https://re2c.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow storable state / start conditions support to be configured at a local block level

cyanogilvie opened this issue · comments

Is it possible to make the --storable-state and --start-conditions options something that can be scoped to a local block, something like:

/*!local:re2c:foo
    re2c:flags:storable-state   = 1;
    re2c:flags:start-conditions = 1;
    ...
*/

/*!local:re2c:bar
    re2c:yyfill:enable = 0;
    ...
*/

The problem I'm facing is that I like re2c a lot, so a lot of problems start to look re2c-shaped. But mixing lexers with storable state (say, parsing a network protocol that arrives in chunks) with structurally simpler ones (perhaps routing requests from that protocol by matching paths and extracting parts) becomes a bit awkward. Splitting them into different source files pushes the complexity into the build system - currently I just teach make to turn any .re to a .c with a rule, which means using the same re2c command line args for all .re files. It also imposes a different considerations on what code should be grouped into a file, sometimes forcing tightly coupled code into different modules, hurting clarity.

I can see how this may be tricky to do, considering !use and includes with conflicting settings for storable state and/or start conditions, but if that could be addressed by just making that an error it wouldn't hurt my use cases.

If there isn't anything too fundamental preventing this, I'm happy to have a go at it, if you could maybe point me in the direction of what would need to be done and anything I'd need to be on the look out for.

It may be not quite what you need, but for storable state you can use a list of block names in /*getstate:re2c*/ directive, e.g.:

/*!re2c:x
    * { x }
*/

/*!re2c:y
    * { y }
*/

/*!re2c:z
    * { z }
*/

/*!getstate:re2c:y:x*/

This will generate a switch on the state variable that will accumulate YYFILL labels from only y and x, but not from z. The first block in the list (y) will be the one that the default switch case goes to.

This was added in https://re2c.org/releases/release_notes.html#release-2-2.

It may be possible to allow per-block storable state (I don't immediately see why not), but it will require some work and examination of various corner cases.

As for condiitons, you can already mix blocks with and without conditions in your program (you need to use -c option globally, but it does not affect blocks without conditions):

// block with conditions
/*!re2c
    <a,b> "a" { a }
    <a> * { x }
*/

// block without conditions
/*!re2c
    * { y }
*/

This will result in:

$ ./re2c 1.re -ic
/* Generated by re2c  3.1 on  on Sat Dec 23 17:46:28 2023 */
// block with conditions

{
	YYCTYPE yych;
	switch (YYGETCONDITION()) {
		case yyca: goto yyc_a;
		case yycb: goto yyc_b;
	}
/* *********************************** */
yyc_a:
	if (YYLIMIT <= YYCURSOR) YYFILL(1);
	yych = *YYCURSOR;
	switch (yych) {
		case 'a': goto yy2;
		default: goto yy1;
	}
yy1:
	++YYCURSOR;
	{ x }
yy2:
	++YYCURSOR;
	{ a }
/* *********************************** */
yyc_b:
	if (YYLIMIT <= YYCURSOR) YYFILL(1);
	yych = *YYCURSOR;
	switch (yych) {
		case 'a': goto yy5;
		default: goto yy4;
	}
yy4:
yy5:
	++YYCURSOR;
	{ a }
}


// block without conditions

{
	YYCTYPE yych;
	if (YYLIMIT <= YYCURSOR) YYFILL(1);
	++YYCURSOR;
	{ y }
}

As I understand, your main issue is mixing storable-state and non-storable-state blocks, which is not addressed by the above. I'm happy to help with review and advice if you want to give it a go. If not I will likely address it later myself, but it may be not too soon (my time allocation is unpredictable now, as I have a 4 months old baby).

Technically I remember that it was infeasible a few years ago when I considered it, but now after the work has been done on getstate:re2c with a custom list of blocks, it may be not that hard.