antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fortran 90 parser emits Syntax Error on valid Fortran with no spaces around logic operator

suehshtri opened this issue · comments

When the antlr4 f90 grammar encounters
if (mydummy > 90.and.gggg > 0) then
it throws a syntax error.
While I agree with complaints about the lack of white-space, I offer that the grammar should accept and parse the code as written,
honoring the traditions of our ancestors.
gfortran -std=f95 -c noSpaceBool.f90 seems to accept it fine.
I have an example f90 file if it will help.
I would add the Fortran tag if I could.

The problem is that the lexer tokenizes incorrectly. A possible fix is to change RDCON, which currently this:

RDCON: NUM+ '.' NUM* EXPON? | NUM* '.' NUM+ EXPON? | NUM+ EXPON;

A semantic predicate could be added to not match if there is an "and" (or other operator name) following the '.'.

The problem is that the lexer tokenizes incorrectly. A possible fix is to change RDCON, which currently this:

RDCON: NUM+ '.' NUM* EXPON? | NUM* '.' NUM+ EXPON? | NUM+ EXPON;

A semantic predicate could be added to not match if there is an "and" (or other operator name) following the '.'.

I'll read about pattern 7, page 65, predicated parser.

I've experimented; my implementation is unenlightened, resorting to a heavy-handed function in the lexer:
@members { Boolean NextIsBop() { var nt = NextToken().Text; return "and".Equals(nt) || "or".Equals(nt); } }
and modifying RDCON
RDCON: NUM+ '.' NUM* EXPON? { !NextIsBop() }? | NUM* '.' NUM+ EXPON? | NUM+ EXPON;

It helps, but the adding of the C#-specific code seems like it could be done with something generic to languages.
I am open to ideas.

All action code must be wrapped in a method, including operators like !--the existance and meaning depends on the language. Fortunately, the grammar has already been written in "target agnostic format". https://github.com/antlr/antlr4/blob/dev/doc/target-agnostic-grammars.md

The main issue I need to check is whether one can call NextToken() from the lexer. That is why my PR looks at the char stream. Update: indeed, you can't.

@kaby76 , thank you. I have to think about this because I thought I was calling NextToken from the lexer, not the parser, but not in the context of having the method in the superclass.
I think we need to dodge .and. .or. and maybe .not.

You were calling NextToken() from the lexer. However, the code didn't work in the trgen-generated tests because of buffering, which caused some tokens to no appear. (If you generate a CSharp parser using trgen -t CSharp, update Fortran90LexerBase.cs with your code, build make, and test bash run.sh ../examples/missing-spaces.f90 -tokens, you will see the missing tokens.) I've updated #4085 for all but the JavaScript and Go ports. There are no PHP and TypeScript ports because Antlr PHP isn't well and because TypeScript port cannot work with split grammars that have a superClass. Instead, I added the Antlr4ng port, which is as fast as CSharp or Java.

Thank you!