pfalcon / pycopy

Pycopy - a minimalist and memory-efficient Python dialect. Good for desktop, cloud, constrained systems, microcontrollers, and just everything.

Home Page:http://pycopy.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ure: Escaping does not work in character set

iBobik opened this issue · comments

>>> import ure as re
>>> re.compile(r'\.')
>>> re.compile(r'[a\.]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: error in regex

Pycopy v3.1.5

Yes, Pycopy's ure is a minimalist regex module, in turn based on a minimalist regex library, https://github.com/pfalcon/re1.5 . The latter supports only the minimal required number of escapes. Any other escapes can be encoded on Python level instead.

In this particular case, it makes no sense to escape a dot in a set, it can (and for ure, should) be written as is. The only char which can be quoted in a set is ] (for -, it just should be written very first in the set.)

Good point, docs are updated: https://pycopy.readthedocs.io/en/latest/library/ure.html . Note that escaping behavior outside character sets was already described previously.

The whole behavior is similar to the recent versions of CPython, which error out on unknown escapes:

Python 3.8.5 (default, Jul 20 2020, 19:48:14) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(r"\g")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
...
re.error: bad escape \g at position 0

Except that CPython does that only for escapes outside char sets, but Pycopy inside too ;-). And of course, it supports less escapes in the first place (but again, on any unsupported, it errors out).