Common optimization misbehaving on character classes
goodmami opened this issue · comments
Michael Wayne Goodman commented
Something is going on with common optimizations on character classes:
>>> import pe
>>> print(pe.compile("[ab] / [bc]", flags=pe.COMMON).grammar)
Start <- [abbc] / [bc]
This occurs when the pe.COMMON
optimization occurs on a choice of character classes. The issue becomes more pronounced when nonterminals are involved with the pe.INLINE
optimization.
>>> grammar = """
... Start <- A / B / Foo
... A <- [Aa]
... B <- [Bb]
... Foo <- A / B / [Cc]
... """
>>> print(pe.compile(grammar, flags=pe.INLINE|pe.COMMON).grammar)
Start <- A / B / Foo
A <- [AaBbAaBbBbCcBbCc]
B <- [Bb]
Foo <- A / B / [Cc]
Michael Wayne Goodman commented
Note that above I'm showing what should be the original grammar after optimization, not the optimized grammar. For [ab] / [bc]
, we expect the optimized version to be [abbc]
or even [abc]
, but the original grammar should remain [ab] / [bc]
. This suggests that some mutable data structures are being manipulated during optimization.