incorrect behavior when parsing unordered group in clean PEG
kdahlhaus opened this issue · comments
The unordered group will not be parsed correctly in a multi-line grammar unless a backslash is the last character on the line in the grammar.
This is how you would expect to write the grammar (no backslash):
print ParserPEG("""
letters = "{" ("a" "b")# "}"
""", "letters").parse(""" { b a } """)
This incorrectly throws an exception: arpeggio.NoMatch: Expected 'a' at position (1, 4) => ' { *b a } '.
This example adds a backslash as the last char in the line fixes the parsing:
print ParserPEG("""
letters = "{" ("a" "b")# "}" \
""", "letters").parse(""" { b a } """)
This correctly prints: { | b | a | }
I'm using Arpeggio 1.7.1 installed from pip under Python 2.7 in Windows.
Here's a unit test that shows the problem and how to fix it for a single line by re-ordering the grammar. I hope that fix helps to identify the problem. Just rename back to python extension.
Thanks for reporting. I've verified it. It is a bug that seems to affect only cleanpeg notation.
The problem was a conflict between unordered group operator symbol #
and the line comment in cleanpeg notation which unfortunately used the same symbol. I decided for an easiest and most pragmatic solution although it introduces a slight backward incompatible change for cleanpeg grammars. The line comment pattern is now changed from #...
to //...
as in regular peg notation. The fix in on the master
branch so you can test and report back if you have any more problems with this.
or group operator can be double # like ## or any other operator not used yet..like ~, ^ etc
'#' is good operator for comment ..my thought :)
Yeah, we could change it but that would be backward incompatible change that is harder to trace down in complex grammars, and it would make cleanpeg syntax more different from regular peg which is a bad thing IMHO. This change in comment makes cleanpeg actually more similar to regular peg while not reducing readability of cleanpeg. textX also uses #
for unordered groups which is another good reason to follow that notation.
Anyway, Arpeggio makes it relatively easy to make your own grammar language notation by following how it is done for peg/cleanpeg :)
Okay, Yea agreed!
That fix passes my unit test and in my project. (FYI - the version is still at 1.7.1)
Thanks!