haskell / text

Haskell library for space- and time-efficient operations over Unicode text.

Home Page:http://hackage.haskell.org/package/text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unpacking in reverse

chris-martin opened this issue · comments

The best way I've come up with so far to make a list that traverses a Text backwards is:

import Data.Text (Text, unsnoc)

unpackReverse :: Text -> String
unpackReverse = maybe [] (\(xs, x) -> x : unpackReverse xs) . unsnoc

Is there a way to make this substantially more efficient? Would it be worth adding to the library?

Maybe with foldr?

I've been very confused by the foldr documentation.

from right to left [...] evaluation actually traverses the Text from left to right

If I have a Text of length n and I force the first m characters from the reverse-unpacked String, would that be O(n) or O(m) with a foldr approach? Internally, foldr is using stream, not reverseStream, so I would think it's not actually decoding from the right.

I suppose I have the same question/request about packing as well. We have unfoldr but not unfoldl, and so I'm not sure how better to reverse-pack any more efficiently than Text.reverse . Text.pack. Since packing is strict anyway, I don't think it matters as much as the unpacking question, but still it seems like one ought to be able to do better.

I don't see any substantial time difference between forward/reverse unpacking, so I suppose you're right about foldr.

import Criterion.Main
import qualified Data.Text as Text

forward = take 200 . Text.unpack

backward = take 200 . Text.foldr (:) []

main = defaultMain
  [ bgroup "forward"
      [ env (pure $ Text.replicate 1000 "abc") $ \t -> bench "1000" $ nf forward t
      , env (pure $ Text.replicate 1000000 "abc") $ \t -> bench "1000000" $ nf forward t
      , env (pure $ Text.replicate 1000000000 "abc") $ \t -> bench "1000000000" $ nf forward t
      ]
  , bgroup "backward"
      [ env (pure $ Text.replicate 1000 "abc") $ \t -> bench "1000" $ nf backward t
      , env (pure $ Text.replicate 1000000 "abc") $ \t -> bench "1000000" $ nf backward t
      , env (pure $ Text.replicate 1000000000 "abc") $ \t -> bench "1000000000" $ nf backward t
      ]
  ]
benchmarking forward/1000
time                 2.441 μs   (2.396 μs .. 2.496 μs)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 2.418 μs   (2.396 μs .. 2.454 μs)
std dev              94.17 ns   (67.51 ns .. 130.5 ns)
variance introduced by outliers: 52% (severely inflated)

benchmarking forward/1000000
time                 2.529 μs   (2.482 μs .. 2.586 μs)
                     0.997 R²   (0.995 R² .. 0.999 R²)
mean                 2.520 μs   (2.489 μs .. 2.582 μs)
std dev              139.6 ns   (97.44 ns .. 199.7 ns)
variance introduced by outliers: 69% (severely inflated)

benchmarking forward/1000000000
time                 2.466 μs   (2.435 μs .. 2.501 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.455 μs   (2.440 μs .. 2.476 μs)
std dev              59.03 ns   (41.70 ns .. 77.29 ns)
variance introduced by outliers: 29% (moderately inflated)

benchmarking backward/1000
time                 2.499 μs   (2.449 μs .. 2.552 μs)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 2.505 μs   (2.469 μs .. 2.574 μs)
std dev              160.0 ns   (97.61 ns .. 236.9 ns)
variance introduced by outliers: 74% (severely inflated)

benchmarking backward/1000000
time                 2.490 μs   (2.454 μs .. 2.529 μs)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 2.480 μs   (2.458 μs .. 2.515 μs)
std dev              91.78 ns   (59.58 ns .. 151.2 ns)
variance introduced by outliers: 49% (moderately inflated)

benchmarking backward/1000000000
time                 2.467 μs   (2.447 μs .. 2.495 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.469 μs   (2.453 μs .. 2.491 μs)
std dev              60.28 ns   (46.57 ns .. 75.09 ns)
variance introduced by outliers: 30% (moderately inflated)

Perhaps all that I needed here was more clear documentation.

I had a brainfart moment. For a lazy reverse-into-string function, right now, the best safe solution seems to be using unsnoc as you did. Another way is to use Data.Text.Unsafe.reverseIter, which might be faster, or maybe not.

Ideally, I think a better solution would be provided by foldl (not foldr), but right now because all the folds go through stream, they all do left-to-right traversals. My point is that we could implement a lazy foldl as a right-to-left traversal, and that would give you lazy reverse-into-string without having to write a recursive function yourself.