ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Visiting without changing the input actually outputs a different value

ericmorand opened this issue · comments

I'm trying to do the most basic thing with OHM: visiting without changing the input match result.

Here is the code that I use:

import {grammar, Node} from "ohm-js";

const myGrammar = grammar(`MyGrammar {
  Operations = nonemptyListOf<op, ",">

  op = "op(" digit ")"
}`);

const semantics = myGrammar.createSemantics();

semantics.addOperation<string>('passthrough', {
    _nonterminal(this: Node, ...nodes) {
        return nodes.map((node) => node.passthrough()).join('');
    },
    _terminal(this: Node) {
        return this.sourceString;
    },
    _iter(this: Node, ...nodes) {
        return nodes.map((node) => node.passthrough()).join('');
    }
})

const val = semantics(myGrammar.match('op(1),op(3),op(4)')).passthrough();

console.log(val); // op(1),,op(3)op(4)

I simply define a passthrough operation that output the source string of nodes untouched. But still, the output is different from the input:

input: op(1),op(3),op(4)
output: op(1),,op(3)op(4)

I have no idea where these two consecutive , come from and why one of them is missing between the two last op.

Note that this is not due to the .join('') calls because removing them outputs an array that also is not a representation of the input:

[[["op(",["1"],")"],[",",","],[["op(",["3"],")"],["op(",["4"],")"]]]]

Is it a bug or am I missing something obvious here?

By pursuing my investigations, I noticed that the issue appears when there is more than 2 nodes in the iteration:

op(1),op(2) => op(1),op(2)
op(1),op(2),op(3) => op(1),,op(2)op(3)

It looks like a bug more and more. :)

I've added some debug info to my code and rewrote the grammar to be the simplest possible:

import {grammar, Node} from "ohm-js";

const myGrammar = grammar(`MyGrammar {
  Operations = nonemptyListOf<digit, ",">
}`);

const semantics = myGrammar.createSemantics();

semantics.addOperation<any>('passthrough', {
    _nonterminal(this: Node, ...nodes) {
        console.log('NON TERMINAL', this.sourceString, this.ctorName);

        return this.children.map((node) => {
            console.log('  CHILD', node.sourceString, node.ctorName);

            return node.passthrough();
        });
    },
    _terminal(this: Node) {
        console.log('    TERMINAL', this.sourceString, this.ctorName);

        return this.sourceString;
    },
    _iter(this: Node, ...nodes) {
        return nodes.map((node) => node.passthrough());
    }
})

const val = semantics(myGrammar.match('1,2,3,4')).passthrough();

console.log(JSON.stringify(val));

Here is the formatted debug output:

NON TERMINAL 1,2,3,4 Operations
  CHILD 1,2,3,4 nonemptyListOf
    NON TERMINAL 1,2,3,4 nonemptyListOf
      CHILD 1 digit
        NON TERMINAL 1 digit
          CHILD 1 _terminal
            TERMINAL 1 _terminal
      CHILD ,2,3,4 _iter
        TERMINAL , _terminal
        TERMINAL , _terminal
        TERMINAL , _terminal
      CHILD ,2,3,4 _iter
        NON TERMINAL 2 digit
         CHILD 2 _terminal
            TERMINAL 2 _terminal
        NON TERMINAL 3 digit
          CHILD 3 _terminal
            TERMINAL 3 _terminal
        NON TERMINAL 4 digit
          CHILD 4 _terminal
            TERMINAL 4 _terminal

What we see here is that the nonemptyListOf node has 3 children:

  • This first one is the 1 digit
  • The second one is an _iter that contains all the comas terminal nodes
  • The third one is an _iter that contains all the other digits

I'm puzzled. Why is the nonemptyListOf having three children (1, ,,,, 234) instead of seven (1, ,, 2, ,, 3, ,, 4)?

Is it expected? If so, how can we reconstruct the input by visiting the nodes?

That is expected, yes. We should definitely improve the documentation about this.

It's due to the way repetition operators (e.g. *, +) are dealt with in semantic actions. If you have a rule like line = one ("," two)+, its semantic action takes three arguments:

line(one, commas, twos) {
    ...
}

Generally this makes writing semantic actions easier (we think) but it's a bit unintuitive to understand at first.

It's difficult to write an operation that will reconstruct the input using only the special actions (_terminal, _nonterminal, _iter). Probably the easiest thing to do would be to add a nonemptyListOf action to your operation.

It is not very elegant, there may be a better way, but it works:

semantics.addOperation<any>('passthrough', {
    nonemptyListOf(first, separators, rest) {
        return [
            first.passthrough(),
            rest.children.map((node, index) => {
                const separatorNode = separators.children[index];

                return [
                    separatorNode.passthrough(),
                    node.passthrough()
                ].join('');
            }).join('')
        ].join('');
    },
    _nonterminal(this: Node, ...nodes) {
        return this.children.map((node) => {
            return node.passthrough();
        }).join('');
    },
    _terminal(this: Node) {
        return this.sourceString;
    }
})

1,2,3,4

Thanks a lot for your help. :)