neogeny / TatSu

竜 TatSu generates Python parsers from grammars in a variation of EBNF

Home Page:https://tatsu.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Generared code parser and dynamic parser return a different AST

dnicolodi opened this issue · comments

I found a difference between the AST returned by the dynamic parser and the generated code parser. Here is a minimal example:

import tatsu
import textwrap
import types

grammar = textwrap.dedent('''
    test::Test = 'TEST' ['A' a:number] ['B' b:number] ;
    number::int = /\d+/ ;
    ''')

parser = tatsu.compile(grammar)

ast = parser.parse('TEST')
print(ast)

ast = parser.parse('TEST A 1')
print(ast)

code = tatsu.to_python_sourcecode(grammar, name='Test', filename='test.py')
module = types.ModuleType('test')
module.__file__ = 'test.py'
exec(compile(code, module.__file__, 'exec'), module.__dict__)

ast = module.TestParser().parse('TEST', start='test')
print(ast)

ast = module.TestParser().parse('TEST A 1', start='test')
print(ast)

The dynamic parser always returns a dict with two elements "a" and "b", the parser from the generated code parses 'TEST' to 'TEST'. The output of the above code is:

{'b': None, 'a': None}
{'b': None, 'a': '1'}
TEST
{'a': '1', 'b': None}

It is also surprising that the generated code parser requires to explicitly pass the name of the grammar start rule, while the dynamic parser does not.

The problem comes from different options in a choice being able to return AST with completely different structure, which is the desirable behavior.

In this case it seems that the behavior of the dynamic parser is the correct one.

I agree that the behavior of the dynamic parser is the correct one. I think the difference in behavior comes from ParseContect._define doing nothing if the current ast node is empty. However, relaxing the condition at the beginning of _define causes some test cases to fail. I need to spend some more time analyzing what the code does.

The problem is that the expression doing the defines should be the first level sequence in the rhs of a rule. It would work for rules that have a choice as main expression, like in this case.

Fixed on this commit 852f657

I was in a hurry, and skipped the pull request by mistake.