GerHobbelt / jison

bison / YACC / LEX in JavaScript (LALR(1), SLR(1), etc. lexer/parser generator)

Home Page:https://gerhobbelt.github.io/jison/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GnuCOBOL

simon-p-r opened this issue · comments

Hi

I want to use the jison to generate a parser for the GnuCobol grammar file, can jison do this? I have already tried however there was some errors.

Thanks
Simon

I don't know the grammar for GnuCOBOL, but a quick google delivers these:

which imply that GnuCobol either has a custom parser or uses a non-LALR(1)/LR(1) bison parser type which resolves those problems - I don't know if today's bison supports Earley grammars and such, which might handle the issues mentioned above, but jison doesn't currently go beyond lalr(1)/lr(1) parsing, so YMMV and I guess the answer is 'no' for both jison and bison.

Had a quick look at the gnucobol 2.2 source tar: they have two yacc grammars in there, so I have no idea how they tackled the issues mentioned while sticking with yacc/bison: I don't see special options in the yacc file nor the Makefile.am in there which would instruxt bison to do anything special.

Fundamentally, from a grammar perspective, jison should be able to handle it if vanilla bison can.

However, a quick initial attempt to feed the .y files to bleeding edge jison turns up a few issues:

  • the bison grammars use in-rule / pre-rule actions like these:
A: { /* action XA */ } B ...;

which is something jison DOES NOT support out of the box: what you MAY do for such rules is add %epsilon rules to simulate this behaviour (that's what classic yacc/bison does too):

A: init_action_rule B ...;

init_action_rule: %epsilon { /* action XA */ };
  • a few other details such as a %token TOKEN_EOF 0 "end of file" statement which overrides the internal EOF token def with a description attribute, etc.: that's something jison-gho SHOULD have accepted, EXCEPT for EOF, and it there's something buggy there...
  • another rule in that grammar (haven't located which one yet) causes jison-gho to spit out an error report which triggers an internal error report as the location tracking fails as the erroneous spot happens to be an epsilon rule:
            throw err;
            ^

TypeError: Cannot read property 'first_line' of undefined
    at Object.lexer_prettyPrintRange [as prettyPrintRange] (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:22780:71)
    at Object.parser__PerformAction (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:18608:27)
    at Object.parse (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:21686:40)
    at Object.parse (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:24377:23)
    at autodetectAndConvertToJSONformat (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:25559:32)
    at new Jison_Generator (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:33713:15)
    at Object.generateParserString (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:34263:25)
    at processInputFile (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:34174:30)
    at Object.cliMain [as main] (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:34252:13)
    at Object.<anonymous> (W:\Users\Ger\Projects\sites\library.visyond.gov\80\lib\tooling\jison\dist\cli-cjs.js:34280:9)

which is another bug in jison.

  • upon quick inspection the GnuCobol grammar also does use some very non-trivial action reference code in some places: $-1 which is a negative index token value reference. jison-gho DOES support that kind of advanced action code references, but I'm not sure you'll end up with exactly the same behaviour as bison there as I never tested that feature against exact bison compatibility.

Quoting the bison manual at https://www.gnu.org/software/bison/manual/html_node/Actions.html (bold emphasis mine):

$n with n zero or negative is allowed for reference to tokens and groupings on the stack before those that match the current rule.
This is a very risky practice, and to use it reliably you must be certain of the context in which the rule is applied. Here is a case in which you can use this reliably:

foo:
  expr bar '+' expr  { … }
| expr bar '-' expr  { … }
;
bar:
  %empty    { previous_expr = $0; }
;

As long as bar is used only in the fashion shown here, $0 always refers to the expr which precedes bar in the definition of foo.

Bottom line: porting that grammar/those grammars is a non-trivial exercise.

Wow @GerHobbelt thanks for your feedback, can I send you an email to discuss this further?

Sorry, I get rather swamped with email, so that won't work. Besides, there's the day job to keep in mind. ;-) Second, using this channel scales better as this stuff is potentially useful for other folks who visit at some point in the future and are looking for this or similar info.

Also, if others want to chime in, correct or improve on what is discussed here, that is possible. Email is for more direct address - not advised unless there are fees involved.

The github issue tracker isn't exactly meant for this type of issue, but that's okay with me: it's sideways related to jison-gho and has helped uncover a few bugs already. (Also, I don't take time to visit SO (StackOVerflow) often; other matters have top priority.)


Bottom line / TL;DR: let's stay in this channel... 😉

Closing due to age; please open a fresh issue, referencing this one for future ref on the same or similar subject.