composewell / unicode-transforms

Fast Unicode normalization in Haskell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use full decompositions in decomposition map

harendra-kumar opened this issue · comments

Currently we have a recursive decompose loop to decompose characters until they are no longer decomposable. This requires multiple lookups in the decomposable bitmap and the loop adds to the cost. Instead, we can statically generate fully decomposed sequence for each character and in the run time logic we won't require a recursive loop. This can potentially speed up NFD/NFC normalizations of several languages which involve composed forms (e.g. Devanagari and Japanese).

I tried this idea last week, but did not gain any performance improvements. Full decompositions might get long-ish, maybe it is better to return them as arrays and not as [Char]?..

I removed the recursive decompose altogether to experiment and it does not seem to help at all.