sebfisch / haskell-regexp

Regular Expression Matching in Haskell

Home Page:http://sebfisch.github.com/haskell-regexp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

more efficient text representation

sebfisch opened this issue · comments

Add support for more efficient representations of strings as provided by the text and/or bytestring packages.

maybe stream-fusion can be used to improve efficiency simply by importing Data.List.Stream instead of using Data.List functions in the implementation of the matcher.

Using Data.List.Stream did not improve performance when searching for :.{4000}: in all Haskell files. Using Data.Stream made performance worse.

Check whether the new implementation of the vector package speeds up our algorithm.

it may be worthwhile to repeat the tests with the new implementation and simple regexps to measure raw search speed.

Maybe it is a good idea to provide an interface to both strict and lazy bytestrings.

The current implementation uses lazy strings and, thus, requires only constant space: the input can be streamed. I expect the same behaviour from an implementation using lazy bytestrings. For small input (like lines of a text file) it may be faster to not support streaming and read the input into a strict bytestring instead.

It is also worth trying to replace lazy strings with enumerators.

have some of these ideas been experimented with as yet?

Apart from stream fusion, I also tried Data.Text without benefit. However, I did not profile to see whether things can be improved - only changed the implementation to use the different interface. All this was done before new versions of Data.Text and GHC came out.