skvadrik / re2c

Lexer generator for C, C++, Go and Rust.

Home Page:https://re2c.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

include directive limitations

h3har opened this issue · comments

commented

It seems to me like the include directive doesn't allow me to include re2c commands inline with other re2c commands. For example:

/*!re2c
re2c:define:YYCTYPE = char;
re2c:define:YYCURSOR = pos;
re2c:define:YYMARKER = marker;
re2c:yyfill:enable = 0;
re2c:flags:tags = 1;

// some re2c expressions here

include:re2c auto-generated-file.re

// some more expressions here

*/

Use case: I have a list of tokens, and (to reduce duplication with other parts of my program) it's actually more convenient to generate re2c code and then include it inline into the surround re2c commands. However re2c's include command doesn't seem to allow this. I've tried different syntaxes, including:

!include:re2c FILE
include:re2c FILE
re2c:include FILE

with and without a semicolon. However the re2c parser seems to get angry at the start of the line so I don't think the semicolon is the issue. Clearly re2c only supports having include statements in their own comments like in the docs:

/*!include:re2c FILE */

I can't use this command in my scenario, since it would require me to close the re2c command and re-open it, which causes re2c to generate two or three separate switch statements!

This isn't a deal-breaker for me. I can obviously work around this. Just thought I'd raise the issue since it might be an unintentional limitation that is easy to fix.

Hi @hddharvey , you are right that currently include directive is not supposed to be used in the middle of the block.

I can't use this command in my scenario, since it would require me to close the re2c command and re-open it, which causes re2c to generate two or three separate switch statements!

Can you attach a more complete example?

re2c only generates code if there are some rules in a block. If a block has only definitions and configurations, no code is generated for it (aside from the special directive blocks).

Depending on your example you may find reuse blocks helpful (https://re2c.org/manual/manual_c.html#reusable-blocks). If you have a /*!rules:re2c ... */ block in the included file, and then add more rules to it in a /*!use: re2c ... */ block after the include, then it might work. But I'm not sure without having a look at your example.

commented

Hi @skvadrik - thanks for the reply.

As to what the specific code is, the // some re2c expressions here comment in my above example refers to a set of re2c rules, such as:

pattern = [A-Za-z] [1-9]+;
pattern { do_something(tok, pos, ...); continue; }
// ... more like this

The file that I need to include is just a bunch of rules:

"token1" { do_something_for_token_1(); continue; }
"token2" { do_something_for_token_2(); continue; }
// ... more like this for each token in the list

Because the ordering of the rules matters, it is important that these rules are included in the middle of the hand-written re2c block. Do reuse blocks allow this? I'm not able to test this right now, but based on the documentation, it's not clear that reuse blocks can be used inline in the way that I want. The docs say that

As of re2c-1.2 it is possible to mix such blocks with normal /*!re2c*/ blocks

however, it's not clear that this doesn't have the same limitations as include blocks where I would have to close out the main re2c comment and then re-open it, causing re2c to generate separate DFAs.

Also, out of curiosity, can multiple reuse blocks be declared? The syntax shown in the docs doesn't seem to assign names to reuse blocks, allowing you to distinguish between multiple reuse block definitions.

Because the ordering of the rules matters, it is important that these rules are included in the middle of the hand-written re2c block. Do reuse blocks allow this?

No, they don't, use blocks can only append to rules blocks.

Also, out of curiosity, can multiple reuse blocks be declared? The syntax shown in the docs doesn't seem to assign names to reuse blocks, allowing you to distinguish between multiple reuse block definitions.

Technically you can have multiple rules blocks in the same file, but you can only use the last one (so every rules block should be followed by one or more use blocks). There are no named blocks yet, but I have plans for that.

Anyway, thanks for the good ideas:

  • named rules blocks
  • allowing some directives in the middle of a block

Both seem feasible and useful. I'll think it over.

@hddharvey I added experimental support for in-block include directive: 1cdde75 (currently on branch inline-include-directive). The syntax is slightly different from the usual directive: !include "x.re" instead of /*!include:re2c "x.re" */. See the added test for a usage example. Let me know if it works for your larger example, and if the syntax seems reasonable.

commented

Hi @skvadrik. Seems to work fine for me! Thanks a lot for this.

Only issue I ran into was some initial confusion when adding a semicolon to the end of the include directive produced a confusing error message about an unexpected '!' causing me to think I hadn't checked out the correct branch. I removed the semicolon and it worked fine though. I assume this error message is an artefact of how re2c's parser works.

Only issue I ran into was some initial confusion when adding a semicolon to the end of the include directive produced a confusing error message about an unexpected '!' causing me to think I hadn't checked out the correct branch.

This is actually a good insight: taking into account the rest of the re2c syntax, it is more natural to add a semicolon after the end of the directive. So I changed the syntax to have a semicolon, and also to give a better error message on syntax errors: error: ill-formed include directive, expected format: `!include "<file>" ; <newline>` committed as 19bf1a8. I'll add docs and merge this into master if there are no other improvement ideas.

commented

Nothing else from me that I can think of right now except that I suppose it would be good if the documentation was more explicit about what you can and cannot do with the reuse blocks (e.g., it's not super clear - unless I'm missing something - that re2c just looks for the previous reuse block only) - however if you're about to change that I'm sure you were planning to edit the docs anyway.

Otherwise, go ahead!

Thanks.

Changes are merged to master branch. I also updated examples and online docs for include directive: https://re2c.org/manual/manual_c.html#include-files.

it would be good if the documentation was more explicit about what you can and cannot do with the reuse blocks

Good point, thanks. This is more related to #51, so lets move discussion to that bug.