balanced sequences

Question

balanced sequences

sebfisch opened this issue 14 years ago · comments

Sebastian Fischer commented 14 years ago

Use a balanced sequence data type such as finger trees to represent sequences of regular expressions. As the Status type can be made a Monoid, finger trees may be good fit for our algorithm.

Sebastian Fischer · Answer 1 · Mon Feb 15 2010 18:17:13 GMT+0800 (China Standard Time)

Using explicit associative operations may lead to new insights.

Maybe it is even helpful to make associativity explicit in the datatype such that concatenation and alternative are both associative. Like

data RegExp a = ... | Seq [RegExp a] | Alt [RegExp a]

Porting the implementation to such a representation may clarify where associativity is important and simplify a later change going from lists to finger trees.

As an aside, with the above representation Seq [] matches the empty word and Alt [] represents the empty language.

Sebastian Fischer · Answer 2 · Tue Feb 16 2010 18:58:36 GMT+0800 (China Standard Time)

Instead of using a predefined balanced sequence type, we could use problem specific information to balance in a specific way. In order to skip inactive parts, we want that active leafs are not nested deep in otherwise inactive expressions. We want them to be at a high level. As active leafs travel from left to right it is good to start with a right associative tree. As active leafs jump from the right to the left end of repeated sequences and new activation starts at the left end, we probably want the left leaf to stay at a high level.

We regenerate the whole expression in each step and can thus change how associative operations are nested during the algorithm. Can we achieve a good custom balancing by rotating around leafs when activating them?

Sebastian Fischer · Answer 3 · Mon Jul 26 2010 23:06:42 GMT+0800 (China Standard Time)

it's probably better to store the weights next to the regexp rather than inside it, to reduce garbage.