Fortran 90 parser emits Syntax Error on valid Fortran with no spaces around logic operator

Question

Fortran 90 parser emits Syntax Error on valid Fortran with no spaces around logic operator

suehshtri opened this issue 4 months ago · comments

When the antlr4 f90 grammar encounters
if (mydummy > 90.and.gggg > 0) then
it throws a syntax error.
While I agree with complaints about the lack of white-space, I offer that the grammar should accept and parse the code as written,
honoring the traditions of our ancestors.
gfortran -std=f95 -c noSpaceBool.f90 seems to accept it fine.
I have an example f90 file if it will help.
I would add the Fortran tag if I could.

Ken Domino · Answer 1 · Tue May 07 2024 17:59:00 GMT+0800 (China Standard Time)

The problem is that the lexer tokenizes incorrectly. A possible fix is to change RDCON, which currently this:

RDCON: NUM+ '.' NUM* EXPON? | NUM* '.' NUM+ EXPON? | NUM+ EXPON;

A semantic predicate could be added to not match if there is an "and" (or other operator name) following the '.'.

S Suehs · Answer 2 · Tue May 07 2024 21:06:52 GMT+0800 (China Standard Time)

The problem is that the lexer tokenizes incorrectly. A possible fix is to change RDCON, which currently this:
RDCON: NUM+ '.' NUM* EXPON? | NUM* '.' NUM+ EXPON? | NUM+ EXPON;
A semantic predicate could be added to not match if there is an "and" (or other operator name) following the '.'.

I'll read about pattern 7, page 65, predicated parser.

S Suehs · Answer 3 · Tue May 07 2024 23:26:08 GMT+0800 (China Standard Time)

I've experimented; my implementation is unenlightened, resorting to a heavy-handed function in the lexer:
@members { Boolean NextIsBop() { var nt = NextToken().Text; return "and".Equals(nt) || "or".Equals(nt); } }
and modifying RDCON
RDCON: NUM+ '.' NUM* EXPON? { !NextIsBop() }? | NUM* '.' NUM+ EXPON? | NUM+ EXPON;

It helps, but the adding of the C#-specific code seems like it could be done with something generic to languages.
I am open to ideas.

Ken Domino · Answer 4 · Wed May 08 2024 08:37:01 GMT+0800 (China Standard Time)

All action code must be wrapped in a method, including operators like !--the existance and meaning depends on the language. Fortunately, the grammar has already been written in "target agnostic format". https://github.com/antlr/antlr4/blob/dev/doc/target-agnostic-grammars.md

The main issue I need to check is whether one can call NextToken() from the lexer. That is why my PR looks at the char stream. Update: indeed, you can't.

S Suehs · Answer 5 · Wed May 08 2024 21:04:35 GMT+0800 (China Standard Time)

@kaby76 , thank you. I have to think about this because I thought I was calling NextToken from the lexer, not the parser, but not in the context of having the method in the superclass.
I think we need to dodge .and. .or. and maybe .not.

Ken Domino · Answer 6 · Wed May 08 2024 21:19:12 GMT+0800 (China Standard Time)

You were calling NextToken() from the lexer. However, the code didn't work in the trgen-generated tests because of buffering, which caused some tokens to no appear. (If you generate a CSharp parser using trgen -t CSharp, update Fortran90LexerBase.cs with your code, build make, and test bash run.sh ../examples/missing-spaces.f90 -tokens, you will see the missing tokens.) I've updated #4085 for all but the JavaScript and Go ports. There are no PHP and TypeScript ports because Antlr PHP isn't well and because TypeScript port cannot work with split grammars that have a superClass. Instead, I added the Antlr4ng port, which is as fast as CSharp or Java.

S Suehs · Answer 7 · Wed May 15 2024 21:51:10 GMT+0800 (China Standard Time)

Thank you!