goodmami / pe

Fastest general-purpose parsing library for Python with a familiar API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use of [, ], and - in character classes needs refinement

goodmami opened this issue · comments

The specification is very detailed about how the - character is interpreted inside a character class (link), but it isn't entirely accurate or complete.

It says that a - at the end of a character class is a literal character, but not if the class has two characters (e.g., [+-]). In this case, it is in fact a range to the ] character, which is treated literally in this context. This causes some problems in that the character class is not terminated when the user expects it to be.

The trouble is that this is how the PEG grammar is defined, so changing this would be an non-monotonic extension (current extensions only add to the syntax, not take away). This might be a good case to use a warn action. If the user wants to silence the warning, they can escape the - or the ], depending on what they intended.

The same solution might be useful for multiline strings, which are often caused by a missing close-quote.