Regex broken after v5.7.3
pressureless opened this issue · comments
The output for the following code with v 5.7.3 is:
tatsu: 5.7.3
# SIMPLE PARSE
# AST
( {'value': 'a1'},
'+',
{'value': 'a2'})
# JSON
[
{
"value": "a1"
},
"+",
{
"value": "a2"
}
]
The output for the same code with v 5.8.3 is:
tatsu: 5.8.3
# SIMPLE PARSE
# AST
( {'value': '1'},
'+',
{'value': '2'})
# JSON
[
{
"value": "1"
},
"+",
{
"value": "2"
}
]
import json
from pprint import pprint
import tatsu
print("tatsu: {}".format(tatsu.__version__))
GRAMMAR=r"""@@grammar::CALC
start
=
expression $
;
expression
=
| expression '+' ~ term
| expression '-' ~ term
| term
;
term
=
| term '*' ~ factor
| term '/' ~ factor
| factor
;
factor
=
| '(' ~ expression ')'
| number
;
number
= value:/[A-Za-z]([A-Za-z0-9]*)/
;"""
def simple_parse():
grammar = GRAMMAR
parser = tatsu.compile(grammar)
ast = parser.parse('a1 + a2')
print('# SIMPLE PARSE')
print('# AST')
pprint(ast, width=20, indent=4)
print()
print('# JSON')
print(json.dumps(ast, indent=4))
if __name__ == '__main__':
simple_parse()
This is an intentional change, see https://github.com/neogeny/TatSu/blob/v5.8.3/CHANGELOG.rst#580--2022-03-12
Honor grouping in pattern expressions with the semantics of
re.findall(pattern, text)[0]
. Now groups that should not be returned when parsing should use the(?:)
syntax.
Following the change log entry:
>>> re.findall('[A-Za-z]([A-Za-z0-9]*)', 'a1')[0]
'1'
You need to modify the regular expression as follows:
>>> re.findall('[A-Za-z](?:[A-Za-z0-9]*)', 'a1')[0]
'a1'
or just drop the grouping, as it does not seem to be needed in this regular expression:
>>> re.findall('[A-Za-z][A-Za-z0-9]*', 'a1')[0]
'a1'
I saw that change, but didn't get it. Thank you for the clarification!