goodmami / pe

Fastest general-purpose parsing library for Python with a familiar API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Escapes in grammars: UTF sequences or single characters?

goodmami opened this issue · comments

The specification's section about escape sequences says that the \xNN, \uNNNN, and \UNNNNNNNN escapes are UTF-8, UTF-16, and UTF-32, respectively, but that may not be accurate. For instance, in Python literals, '\xNN' is just a character whose value is given by two hexadecimal digits, and '\xNN\xNN' is thus two characters and not a sequence of two bytes.

Going with each escape being a single character would make parsing grammars easier, at least. Tests are needed, whichever way this goes.