textX / Arpeggio

Parser interpreter based on PEG grammars written in Python http://textx.github.io/Arpeggio/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

incorrect behavior when parsing unordered group in clean PEG

kdahlhaus opened this issue · comments

The unordered group will not be parsed correctly in a multi-line grammar unless a backslash is the last character on the line in the grammar.

This is how you would expect to write the grammar (no backslash):

print  ParserPEG("""
  letters = "{" ("a" "b")#  "}"
""", "letters").parse(""" { b a } """)

This incorrectly throws an exception: arpeggio.NoMatch: Expected 'a' at position (1, 4) => ' { *b a } '.

This example adds a backslash as the last char in the line fixes the parsing:

print ParserPEG("""
  letters = "{" ("a" "b")#  "}" \
""", "letters").parse(""" { b a } """) 

This correctly prints: { | b | a | }

I'm using Arpeggio 1.7.1 installed from pip under Python 2.7 in Windows.

Here's a unit test that shows the problem and how to fix it for a single line by re-ordering the grammar. I hope that fix helps to identify the problem. Just rename back to python extension.

test_unordered_group.py.txt

Thanks for reporting. I've verified it. It is a bug that seems to affect only cleanpeg notation.

The problem was a conflict between unordered group operator symbol # and the line comment in cleanpeg notation which unfortunately used the same symbol. I decided for an easiest and most pragmatic solution although it introduces a slight backward incompatible change for cleanpeg grammars. The line comment pattern is now changed from #... to //... as in regular peg notation. The fix in on the master branch so you can test and report back if you have any more problems with this.

commented

or group operator can be double # like ## or any other operator not used yet..like ~, ^ etc

'#' is good operator for comment ..my thought :)

Yeah, we could change it but that would be backward incompatible change that is harder to trace down in complex grammars, and it would make cleanpeg syntax more different from regular peg which is a bad thing IMHO. This change in comment makes cleanpeg actually more similar to regular peg while not reducing readability of cleanpeg. textX also uses # for unordered groups which is another good reason to follow that notation.

Anyway, Arpeggio makes it relatively easy to make your own grammar language notation by following how it is done for peg/cleanpeg :)

commented

Okay, Yea agreed!

That fix passes my unit test and in my project. (FYI - the version is still at 1.7.1)

Thanks!