antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fortran90: status can be a legit variable, not always a keyword

suehshtri opened this issue · comments

What I was doing: parsing some fortran code, as one is want to do.
What I expected: a parse with no errors.
What happened instead: errors

  • no viable alternative at input '(argument_A, ...
  • offending token: 'status'

In this case, status is local variable. Looks like it gets caught by STATUS in lexer.

The standard: mentions STATUS Referenced f9x.book, page 111.

Fortran found in the wild presents the use of a variable named 'status' outside of those contexts in which it is a language word, compileable by gfortran and Intel Fortran compilers.

github refused f90 file.

Reproduction appears to be as easy as
declaring a variable integer(4) :: status in a subroutine.
.

The standard: mentions STATUS Referenced f9x.book, page 111.

Just to be clear, you are referring to the N692.pdf draft, the last revision before publishing ISO 1539:1990.

Normally, it should be parsed as an entity-decl. (E.g., in.txt out.txt) It doesn't say in the draft what an object-name is, but in ISO 1539:2023, it does: R804 object-name is name (p 104) => R603 name is letter [ alphanumeric-character ] ... C601 (R603) The maximum length of a name is 63 characters.

That implies that we should have rules objectName : name; and name : NAME | STATUS | ......;. However, with such additional rules, ambiguity (and therefore performance) might become a problem. But, it is something that should be tried.

re N962, Yes, you are right. I will be more precise in the future. Thank you for your patience.

ISO 1539:1990 is marked as withdrawn? I will go read about that.

I remember reading in one of the ANTLR books about an island language. I might try something like that. Inside of parens of OPEN, and maybe READ,WRITE, might be a mini-language. Ugh, hopefully never STATUS=STATUS. I will dig into the books again. I'm not sure I can rig an island lexer to dodge STATUS in lexer.

See #4106 for a fix.

See #4106 for a fix.

I suggested a further test. I hope it helps. My test file worked but I quickly created another one to show a related problem.

I suggested a further test. I hope it helps. My test file worked but I quickly created another one to show a related problem.

Sorry but where are these input? It is not with your comment nor over in the thread for the PR. Please provide.

I suggested a further test. I hope it helps. My test file worked but I quickly created another one to show a related problem.

Sorry but where are these input? It is not with your comment nor over in the thread for the PR. Please provide.

Odd. The comments still say "Pending" on 4106. I'll copy them here.

Would a better test be
integer(4) :: status
since the original problem is triggered by status ? Variable name is exactly 'status'.

I think this one will help, too.

    integer(4) :: status
    call some_routine(status)

Odd. The comments still say "Pending" on 4106. I'll copy them here.

I don't see any "pending" comments on #4106

You said I quickly created another one to show a related problem. Is there a test input that #4106 fails on?

I updated the test https://github.com/kaby76/grammars-v4/blob/5bc2ef2e79d6efaaaee302d4768b8311239f79de/fortran/fortran90/examples/4105.f90 . To be thorough, I should check the other keywords in place of status in the test, verify that gfortran -std=f95 -c 4105.f90 works, and of course the Antlr parser works.

I'll paste the content of the file I used since it won't let me attach a file.

Github wants a file name suffix of .txt. Otherwise it won't do the attach.

kindpain.f90

module mymodule_0_pain
  implicit none
  contains

  subroutine my_sub(argument_A, argument_B, argument_C, argument_D, argument_E, argument_F, argument_G)

    integer(2), intent(in) :: argument_A
    integer(2), intent(in) :: argument_B
    integer(2), intent(in) :: argument_C
    integer(2), intent(in) :: argument_D
    integer(2), intent(in) :: argument_E
    integer(2), intent(in) :: argument_F
    integer(2), intent(out) :: argument_G
    integer(4) :: status
    integer(4) :: local_var1
    integer(4) :: local_var2

  end subroutine
end module

callstatuspain.f90

module mymodule_0_pain
  implicit none
  contains

  subroutine my_sub(argument_A, argument_B, argument_C, argument_D, argument_E, argument_F, argument_G)

    integer(2), intent(in) :: argument_A
    integer(2), intent(in) :: argument_B
    integer(2), intent(in) :: argument_C
    integer(2), intent(in) :: argument_D
    integer(2), intent(in) :: argument_E
    integer(2), intent(in) :: argument_F
    integer(2), intent(out) :: argument_G
    integer(4) :: status
    integer(4) :: local_var1
    integer(4) :: local_var2

    call some_routine(status)

  end subroutine
end module

using an built and compiled C# parser:

----- C:\Prototypes\XXXX.experiment.fortran\pain\callstatuspain.f90
 --SubPrograms--
 my_sub on C:\Prototypes\XXXX.experiment.fortran\pain\callstatuspain.f90:5
 <missing NAME> on C:\Prototypes\XXXX.experiment.fortran\pain\callstatuspain.f90:20
 --Errors--
Error: C:\Prototypes\XXXX.experiment.fortran\pain\callstatuspain.f90:18: no viable alternative at input '(argument_A, argument_B, argument_C, argument_D, argument_E, argument_F, argument_G)\r\n\r\n    integer(2), intent(in) :: argument_A\r\n    integer(2), intent(in) :: argument_B\r\n    integer(2), intent(in) :: argument_C\r\n    integer(2), intent(in) :: argument_D\r\n    integer(2), intent(in) :: argument_E\r\n    integer(2), intent(in) :: argument_F\r\n    integer(2), intent(out) :: argument_G\r\n    integer(4) :: status\r\n    integer(4) :: local_var1\r\n    integer(4) :: local_var2\r\n\r\n    call some_routine(status'
       [@171,567:572='status',<98>,18:22] at 22 by SubroutineRangeContext
    expected: CONTAINS,ENTRY,END..INTENT,USE,DOUBLEPRECISION,ASSIGNSTMT..COMMON,REAL..EQUIVALENCE,POINTER,ACCESSSPEC..IMPLICIT,CHARACTER..IF,DO..CONTINUE,WHERE,SELECTCASE..SELECT,STOP,PAUSE..OPEN,CALL..DOUBLE,INQUIRE..REWIND,'(',ALLOCATE,';',COMPLEX,INTEGER..LOGICAL,DEALLOCATE..CYCLE,INTERFACE,ICON..EXIT,
Error: C:\Prototypes\XXXX.experiment.fortran\pain\callstatuspain.f90:21: missing NAME at 'end'
       [@178,596:598='end',<12>,21:0] at 0 by
    expected:
Error: C:\Prototypes\XXXX.experiment.fortran\pain\callstatuspain.f90:22: no viable alternative at input 'module\r\n'
       [@182,608:607='<EOF>',<-1>,22:0] at 0 by ModuleContext
    expected: MODULE,
----- END C:\Prototypes\XXXX.experiment.fortran\pain\callstatuspain.f90

My examples had a "c" in column 1, so call some_routine(s) turned into a comment.

! code starts in column 1
integer(4) :: status
call some_routine(status)
end

This brings to fore the issue of source form (page 20 of the N692.pdf draft spec). It states: There are two source forms: free and fixed. Free form and fixed form must not be mixed in the same program unit. The means for specifying the source form of a program unit are processor dependent. Clearly, we are not doing that correctly because a "C" in column 1 is "fixed form".

Further, the parse problem is due to expressions. The grammar has primary defined here. But, it's in some bogus order with non-terminals that don't even match the n692.pdf draft spec. (page 68, rule B701). There isn't anything remotely named nameDataRef (defined here). This needs to be completely rewritten, and then determined why it was written this way to begin with.

Also, your example takes 0.2s to parse. That's really slow! Looks like the problem there is subroutineRange, which requires a huge max-k:

$ trperf x.x -h | column -t
Time to parse: 00:00:00.2091210
File  Decision  Rule                           Invocations  Time      Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
x.x   0         executableProgram              0            0         0        0      0         0            0       0
...
x.x   289       subroutineRange                1            0.001803  1        1      0         0            0       1
x.x   290       subroutineRange                1            0.960394  157      157    0         0            0       22
x.x   291       implicitStmt                   1            0.011732  3        3      0         0            0       2