ure: Escaping does not work in character set
iBobik opened this issue · comments
>>> import ure as re
>>> re.compile(r'\.')
>>> re.compile(r'[a\.]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: error in regex
Pycopy v3.1.5
Yes, Pycopy's ure
is a minimalist regex module, in turn based on a minimalist regex library, https://github.com/pfalcon/re1.5 . The latter supports only the minimal required number of escapes. Any other escapes can be encoded on Python level instead.
In this particular case, it makes no sense to escape a dot in a set, it can (and for ure
, should) be written as is. The only char which can be quoted in a set is ]
(for -
, it just should be written very first in the set.)
Good point, docs are updated: https://pycopy.readthedocs.io/en/latest/library/ure.html . Note that escaping behavior outside character sets was already described previously.
The whole behavior is similar to the recent versions of CPython, which error out on unknown escapes:
Python 3.8.5 (default, Jul 20 2020, 19:48:14)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(r"\g")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
re.error: bad escape \g at position 0
Except that CPython does that only for escapes outside char sets, but Pycopy inside too ;-). And of course, it supports less escapes in the first place (but again, on any unsupported, it errors out).