Speed up division operation in hangul decomposition
harendra-kumar opened this issue · comments
Like we used a custom division operation in case of composition we can also replace the quotRem
operations in decomposition case as well with a custom division operation. Here is an example:
divBy28 :: Int -> (Int, Int)
divBy28 n = go 0 n
where
go k i =
let (q, r) = divBy32 i
in if (q == 0)
then
if r >= 28
then (k + 1, r - 28)
else (k, r)
else go (k + q) (q `unsafeShiftL` 2 + r)
divBy32 x =
let q = x `unsafeShiftR` 5
r = x .&. 31
in (q, r)
Multiplications are quick on modern hadrware, so I suggest using something branchless along these lines:
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
import Data.Bits
import GHC.Exts
-- Input must be non-negative
quotRem21 :: Int -> (Int, Int)
quotRem21 n
| finiteBitSize (0 :: Word) /= 64
= n `quotRem` 21
| otherwise
= (fromIntegral q, fromIntegral (w - 21 * q))
where
w = fromIntegral n
high = highMul w 14054662151397753613 -- (2^68+17)/21
q = high `shiftR` 4
-- Input must be non-negative
quotRem28 :: Int -> (Int, Int)
quotRem28 n
| finiteBitSize (0 :: Word) /= 64
= n `quotRem` 28
| otherwise
= (fromIntegral q, fromIntegral r)
where
w = fromIntegral n
high = highMul w 5270498306774157605 -- (2^65+3)/7
q = high `shiftR` 3
prod = (q `shiftL` 3 - q) `shiftL` 2
r = w - prod
-- Input must be non-negative
divisibleBy28 :: Int -> Bool
divisibleBy28 n = n .&. 3 == 0 && divisibleBy7 (n `shiftR` 2)
-- Input must be non-negative
divisibleBy7 :: Int -> Bool
divisibleBy7 n
| finiteBitSize (0 :: Word) /= 64
= n `rem` 7 == 0
| otherwise
= w == (q `shiftL` 3) - q
where
w = fromIntegral n
high = highMul w 5270498306774157605 -- (2^65+3)/7
q = high `shiftR` 1
highMul :: Word -> Word -> Word
highMul (W# x#) (W# y#) = W# high#
where
(# high#, _ #) = timesWord2# x# y#
Wow! That's very cool. multiplications are supposed to be quite fast: https://gmplib.org/~tege/x86-timing.pdf . And we don't have branches as well.
Can you try that and see how much it helps?
It may be a good idea to have a library to generate custom division operations like this and maybe other such utilities. C/C++ has a library called libdivide for helping with division.