chocoteam / choco-solver

An open-source Java library for Constraint Programming

Home Page:http://choco-solver.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confusing regex behavior with letters

aengelberg opened this issue · comments

The Automaton class allows me to use letters in the regular expression, but then the regular constraint will constrain a var to be that character's ascii code minus 7. For example, given the constraint:

ICF.regular({x}, FiniteAutomaton("a"))

The solution will be:

x = 90

whereas I would expect 97. I see here that some sort of offset is built up in the mapping from chars to integers because it skips over certain special characters ], [, etc.

So I understand why this behavior is happening, but I'm curious why it must be designed this way, or if some validation (or documentation) can be implemented to help new users like me from avoiding this.

It sounds like the workaround is to replace each letter with <xx>, where xx is the actual ASCII code I'm looking for.

--Alex

Hi,

This has to do with #156.

This is a common misunderstanding: the REGULAR constraint associates integer variables and can only handle regexp based on digits or numbers (surrounded by chevrons), not characters.
Indeed, Choco managed integer variables (wherein the value that can be assigned to the variable is an integer) but not character variables (wherein the value that can be assigned to the variable would be a character).
So if you need to link letters (characters) to integers, you should do it by yourself, as a preprocessing.

Note that the underlying automata the REGULAR constraint deals with is based on characters due to dependency of http://www.brics.dk/automaton/ but this is implementation concern only.
Some of the characters are excluded (&?*+{},[].#@"<>) but the remaining ones are used to encode the digits/numbers declared in the regexp.

Hope it helps,
CP