composewell / unicode-transforms

Fast Unicode normalization in Haskell

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speed up division operation in hangul decomposition

harendra-kumar opened this issue · comments

Like we used a custom division operation in case of composition we can also replace the quotRem operations in decomposition case as well with a custom division operation. Here is an example:

divBy28 :: Int -> (Int, Int)
divBy28 n = go 0 n

    where

    go k i =
        let (q, r) = divBy32 i
        in if (q == 0)
           then
                if r >= 28
                then (k + 1, r - 28)
                else (k, r)
           else go (k + q) (q `unsafeShiftL` 2 + r)

    divBy32 x =
        let q = x `unsafeShiftR` 5
            r = x .&. 31
        in (q, r)

Multiplications are quick on modern hadrware, so I suggest using something branchless along these lines:

{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}

import Data.Bits
import GHC.Exts

-- Input must be non-negative
quotRem21 :: Int -> (Int, Int)
quotRem21 n
  | finiteBitSize (0 :: Word) /= 64
  = n `quotRem` 21
  | otherwise
  = (fromIntegral q, fromIntegral (w - 21 * q))
  where
    w = fromIntegral n
    high = highMul w 14054662151397753613 -- (2^68+17)/21
    q = high `shiftR` 4

-- Input must be non-negative
quotRem28 :: Int -> (Int, Int)
quotRem28 n
  | finiteBitSize (0 :: Word) /= 64
  = n `quotRem` 28
  | otherwise
  = (fromIntegral q, fromIntegral r)
  where
    w = fromIntegral n
    high = highMul w 5270498306774157605 -- (2^65+3)/7
    q = high `shiftR` 3
    prod = (q `shiftL` 3 - q) `shiftL` 2
    r = w - prod

-- Input must be non-negative
divisibleBy28 :: Int -> Bool
divisibleBy28 n = n .&. 3 == 0 && divisibleBy7 (n `shiftR` 2)

-- Input must be non-negative
divisibleBy7 :: Int -> Bool
divisibleBy7 n
  | finiteBitSize (0 :: Word) /= 64
  = n `rem` 7 == 0
  | otherwise
  = w == (q `shiftL` 3) - q
  where
    w = fromIntegral n
    high = highMul w 5270498306774157605 -- (2^65+3)/7
    q = high `shiftR` 1

highMul :: Word -> Word -> Word
highMul (W# x#) (W# y#) = W# high#
  where
    (# high#, _ #) = timesWord2# x# y#

Wow! That's very cool. multiplications are supposed to be quite fast: https://gmplib.org/~tege/x86-timing.pdf . And we don't have branches as well.

Can you try that and see how much it helps?

It may be a good idea to have a library to generate custom division operations like this and maybe other such utilities. C/C++ has a library called libdivide for helping with division.