goodmami / pe

Fastest general-purpose parsing library for Python with a familiar API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Common optimization misbehaving on character classes

goodmami opened this issue · comments

Something is going on with common optimizations on character classes:

>>> import pe
>>> print(pe.compile("[ab] / [bc]", flags=pe.COMMON).grammar)
Start <- [abbc] / [bc]

This occurs when the pe.COMMON optimization occurs on a choice of character classes. The issue becomes more pronounced when nonterminals are involved with the pe.INLINE optimization.

>>> grammar = """
... Start <- A / B / Foo
... A <- [Aa]
... B <- [Bb]
... Foo <- A / B / [Cc]
... """
>>> print(pe.compile(grammar, flags=pe.INLINE|pe.COMMON).grammar)
Start <- A / B / Foo
A     <- [AaBbAaBbBbCcBbCc]
B     <- [Bb]
Foo   <- A / B / [Cc]

Note that above I'm showing what should be the original grammar after optimization, not the optimized grammar. For [ab] / [bc], we expect the optimized version to be [abbc] or even [abc], but the original grammar should remain [ab] / [bc]. This suggests that some mutable data structures are being manipulated during optimization.