sebfisch / haskell-regexp

Regular Expression Matching in Haskell

Home Page:http://sebfisch.github.com/haskell-regexp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

balanced sequences

sebfisch opened this issue · comments

Use a balanced sequence data type such as finger trees to represent sequences of regular expressions. As the Status type can be made a Monoid, finger trees may be good fit for our algorithm.

Using explicit associative operations may lead to new insights.

Maybe it is even helpful to make associativity explicit in the datatype such that concatenation and alternative are both associative. Like

data RegExp a = ... | Seq [RegExp a] | Alt [RegExp a]

Porting the implementation to such a representation may clarify where associativity is important and simplify a later change going from lists to finger trees.

As an aside, with the above representation Seq [] matches the empty word and Alt [] represents the empty language.

Instead of using a predefined balanced sequence type, we could use problem specific information to balance in a specific way. In order to skip inactive parts, we want that active leafs are not nested deep in otherwise inactive expressions. We want them to be at a high level. As active leafs travel from left to right it is good to start with a right associative tree. As active leafs jump from the right to the left end of repeated sequences and new activation starts at the left end, we probably want the left leaf to stay at a high level.

We regenerate the whole expression in each step and can thus change how associative operations are nested during the algorithm. Can we achieve a good custom balancing by rotating around leafs when activating them?

it's probably better to store the weights next to the regexp rather than inside it, to reduce garbage.