antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cpp grammar: Unable to parse pointer to array types as template argument

deadlocklogic opened this issue · comments

Consider:

Test<char(*)[21]> type;

This statement can't be parsed, notice that a pointer to function parses correctly: Test<void(*)(void)> type;.
Thanks.

@kaby76 Never encountered such use case in your scraping algorithm?
For the record, Test<char(&)[21]> type; and Test<char(&&)[21]> type; aren't parsed too.
Note: 21 can be replaced with any other constant value.

The problem is that the set of test inputs is small, only 17 files. Of those, it only tests about 60% of the parser rules as measured by trcover. A proper test suite should have been supplied when the grammar was submitted. At some point when we redo this grammar, a test suite will be added, first choice likely from the GNU project.

A practical choice. You can choose from big projects a test suite like GNU, clang, Qt, chrome etc...
The only thing which might stands in the way is the usage of the preprocessor: so maybe a preliminary phase hooked to a compiler which generates processed code before running the tests.

By the way, do you think my issue is trivially fixable? I guess it shouldn't be this difficult.

The only thing which might stands in the way is the usage of the preprocessor: so maybe a preliminary phase hooked to a compiler which generates processed code before running the tests.

Correct. There is no preprocessor parsing and translation currently. There are rules to ignore preprocessor directives in the grammar, but the parse can fail for a number of reasons, e.g., if a #ifdef tries to break a statement across the "then" and "else", using non-ISO extensions, etc.

===

Someone didn't refactor the noptr-abstract-declarator rule from the spec. The spec says:

noptr-abstract-declarator:
    noptr-abstract-declaratoropt parameters-and-qualifiers
    noptr-abstract-declaratoropt [ constant-expressionopt ] attribute-specifier-seqopt
    ( ptr-abstract-declarator )

Instead, we ended up with this, which is wrong.

noPointerAbstractDeclarator
: noPointerAbstractDeclarator (
parametersAndQualifiers
| noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
)
| parametersAndQualifiers
| LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
| LeftParen pointerAbstractDeclarator RightParen
;

If anything, it should look like this:

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;

That seems to "work" but it may break other things.

But even that I wouldn't do. It should be defined using Kleene operators.

Following the specs, I was able to fix this issue.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket
    | LeftParen pointerAbstractDeclarator RightParen
    ;

abstractPackDeclarator
    : noPointerAbstractPackDeclarator
    | pointerOperator abstractPackDeclarator
    ;

noPointerAbstractPackDeclarator
    : noPointerAbstractPackDeclarator parametersAndQualifiers
    | Ellipsis
    ;

Wonder why the grammar doesn't follow the specs in the first place.
Is there any roadmap for a newer/cleaner grammar?

Wonder why the grammar doesn't follow the specs in the first place.
Is there any roadmap for a newer/cleaner grammar?

This is precisely what I have been complaining about for years.

The proper way forward is to scrape the grammar from scratch. Then, employ known, logical, repeatable, precise transformations to obtain a functioning grammar. I had performed some work on this over here: https://github.com/kaby76/scrape-c-plus-plus-spec. We can try to use some of the transformations in the existing cpp grammar, but there's really no documentation as to why the rules are structured as they are.

I now see that https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/ has a firewall preventing downloading even the drafts. Really just terrible. The only current free download of drafts is https://github.com/cplusplus/draft. Who knows how long this will be up.

Maybe the main goal of ANTLR as a project, is to focus on the generator's runtime/tools, grammars are just treated as plugins which anyone can easily create/modify.

This draft looks plausible even though not ISO compliant.
My use case, isn't high demanding in terms of grammar.
I am using ANTLR to parse something like SWIG's typemaps, so just using a compact version of the grammar mainly for parsing expression statement like patterns.

This draft looks plausible even though not ISO compliant.

Why would we use a non-authoritative website source when we have authoritative drafts in PDF format at https://github.com/cplusplus, and official standard doc at https://www.iso.org/standard/79358.html and have code already to read and strip the EBNF from the official drafts and formal spec?


My plan is to make a PR to fix this parsing issue. One step at a time.

I see now what the developer did with noPointerAbstractDeclarator. The issue was The following sets of rules are mutually left-recursive [noPointerAbstractDeclarator].

Spec:

noptr-abstract-declarator:
    noptr-abstract-declaratoropt parameters-and-qualifiers
    noptr-abstract-declaratoropt [ constant-expressionopt ] attribute-specifier-seqopt
    ( ptr-abstract-declarator )

Step 1: Translated to mutually-left recursive Antlr4 format.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator? parametersAndQualifiers
    | noPointerAbstractDeclarator? LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;

Step 2: Remove ?-operator.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;

This is perfectly fine as is (Antlr4 handles left-recursion), but can be further improved with Kleene operator.

Step 3: Reorder alts.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | parametersAndQualifiers
    | LeftParen pointerAbstractDeclarator RightParen
    ;

Step 4: Group alts.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )
    | (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen)
    ;

Step 5: Transform to *-operator closure.

noPointerAbstractDeclarator
    : (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen) (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )*
    ;

Each of these steps can be handled by the Trash toolkit from the scraped grammar.