Generared code parser and dynamic parser return a different AST

Question

Generared code parser and dynamic parser return a different AST

dnicolodi opened this issue 2 years ago · comments

I found a difference between the AST returned by the dynamic parser and the generated code parser. Here is a minimal example:

import tatsu
import textwrap
import types

grammar = textwrap.dedent('''
    test::Test = 'TEST' ['A' a:number] ['B' b:number] ;
    number::int = /\d+/ ;
    ''')

parser = tatsu.compile(grammar)

ast = parser.parse('TEST')
print(ast)

ast = parser.parse('TEST A 1')
print(ast)

code = tatsu.to_python_sourcecode(grammar, name='Test', filename='test.py')
module = types.ModuleType('test')
module.__file__ = 'test.py'
exec(compile(code, module.__file__, 'exec'), module.__dict__)

ast = module.TestParser().parse('TEST', start='test')
print(ast)

ast = module.TestParser().parse('TEST A 1', start='test')
print(ast)

The dynamic parser always returns a dict with two elements "a" and "b", the parser from the generated code parses 'TEST' to 'TEST'. The output of the above code is:

{'b': None, 'a': None}
{'b': None, 'a': '1'}
TEST
{'a': '1', 'b': None}

It is also surprising that the generated code parser requires to explicitly pass the name of the grammar start rule, while the dynamic parser does not.

Juancarlo Añez · Answer 1 · Sun Jul 10 2022 00:05:41 GMT+0800 (China Standard Time)

The problem comes from different options in a choice being able to return AST with completely different structure, which is the desirable behavior.

In this case it seems that the behavior of the dynamic parser is the correct one.

Daniele Nicolodi · Answer 2 · Sun Jul 10 2022 00:19:30 GMT+0800 (China Standard Time)

I agree that the behavior of the dynamic parser is the correct one. I think the difference in behavior comes from ParseContect._define doing nothing if the current ast node is empty. However, relaxing the condition at the beginning of _define causes some test cases to fail. I need to spend some more time analyzing what the code does.

Juancarlo Añez · Answer 3 · Sun Jul 10 2022 00:35:58 GMT+0800 (China Standard Time)

The problem is that the expression doing the defines should be the first level sequence in the rhs of a rule. It would work for rules that have a choice as main expression, like in this case.

Juancarlo Añez · Answer 4 · Sun Jul 10 2022 00:39:38 GMT+0800 (China Standard Time)

Fixed on this commit 852f657

I was in a hurry, and skipped the pull request by mistake.