EOF not recognized on corner case specification
kuniss opened this issue · comments
Minimal erroneous EAG specification:
NoEOF<+ 'Done': Done>: 'x'.
Done = 'Done'.
The only recognized sentence should be a lonely 'x':
~/git/gamma/build$ echo x | ./NoEOF
info: NoEOF compiler (generated with epsilon)
Done
But it also compile successfully on input 'xx':
~/git/gamma/build$ echo xx | ./NoEOF
info: NoEOF compiler (generated with epsilon)
Done
In fact, it compiles on any arbitrary input after the first 'x':
~/git/gamma/build$ echo xsakdjf7 | ./NoEOF
info: NoEOF compiler (generated with epsilon)
Done
This was once a feature of Oberon:
If the source code is Module Compiler; ... END Compiler.
then the parser stops at the final dot.
The rest of the file was used for test commands:
https://github.com/linkrope/gamma/blob/master/test/oberon0/Sample.Mod#L46-L52
Typically the last line of the source code was something like
END Compiler.Compile *
where you selected Compiler.Compile *
in the Oberon system to run the function of the module.
That was pretty cool back then.
While this works for top-down parsers, bottom-up parsers usually introduce the extra rule
S' -> S <EOF>
The end symbol is the required lookahead to finally reduce everything on the stack.
So it could be difficult to reproduce the Oberon behavior with the bottom-up parser.
On the other hand, removing this "feature" would break the test cases for Sample.Mod
.
You would no longer be able to generate an Oberon compiler...
Didn't know that, even back then.
What do you think about making it a special generator option? As it is quite unusual for other languages.
I guess it only works if the last symbol in the grammar is a terminal, isn't it?