Generared code parser and dynamic parser return a different AST
dnicolodi opened this issue · comments
I found a difference between the AST returned by the dynamic parser and the generated code parser. Here is a minimal example:
import tatsu
import textwrap
import types
grammar = textwrap.dedent('''
test::Test = 'TEST' ['A' a:number] ['B' b:number] ;
number::int = /\d+/ ;
''')
parser = tatsu.compile(grammar)
ast = parser.parse('TEST')
print(ast)
ast = parser.parse('TEST A 1')
print(ast)
code = tatsu.to_python_sourcecode(grammar, name='Test', filename='test.py')
module = types.ModuleType('test')
module.__file__ = 'test.py'
exec(compile(code, module.__file__, 'exec'), module.__dict__)
ast = module.TestParser().parse('TEST', start='test')
print(ast)
ast = module.TestParser().parse('TEST A 1', start='test')
print(ast)
The dynamic parser always returns a dict with two elements "a" and "b", the parser from the generated code parses 'TEST' to 'TEST'. The output of the above code is:
{'b': None, 'a': None}
{'b': None, 'a': '1'}
TEST
{'a': '1', 'b': None}
It is also surprising that the generated code parser requires to explicitly pass the name of the grammar start rule, while the dynamic parser does not.
The problem comes from different options in a choice being able to return AST with completely different structure, which is the desirable behavior.
In this case it seems that the behavior of the dynamic parser is the correct one.
I agree that the behavior of the dynamic parser is the correct one. I think the difference in behavior comes from ParseContect._define
doing nothing if the current ast node is empty. However, relaxing the condition at the beginning of _define
causes some test cases to fail. I need to spend some more time analyzing what the code does.
The problem is that the expression doing the defines should be the first level sequence in the rhs of a rule. It would work for rules that have a choice as main expression, like in this case.
Fixed on this commit 852f657
I was in a hurry, and skipped the pull request by mistake.