Unpacking in reverse
chris-martin opened this issue · comments
The best way I've come up with so far to make a list that traverses a Text
backwards is:
import Data.Text (Text, unsnoc)
unpackReverse :: Text -> String
unpackReverse = maybe [] (\(xs, x) -> x : unpackReverse xs) . unsnoc
Is there a way to make this substantially more efficient? Would it be worth adding to the library?
Maybe with foldr
?
I've been very confused by the foldr
documentation.
from right to left [...] evaluation actually traverses the Text from left to right
If I have a Text
of length n and I force the first m characters from the reverse-unpacked String
, would that be O(n) or O(m) with a foldr
approach? Internally, foldr
is using stream
, not reverseStream
, so I would think it's not actually decoding from the right.
I suppose I have the same question/request about packing as well. We have unfoldr
but not unfoldl
, and so I'm not sure how better to reverse-pack any more efficiently than Text.reverse . Text.pack
. Since packing is strict anyway, I don't think it matters as much as the unpacking question, but still it seems like one ought to be able to do better.
I don't see any substantial time difference between forward/reverse unpacking, so I suppose you're right about foldr
.
import Criterion.Main
import qualified Data.Text as Text
forward = take 200 . Text.unpack
backward = take 200 . Text.foldr (:) []
main = defaultMain
[ bgroup "forward"
[ env (pure $ Text.replicate 1000 "abc") $ \t -> bench "1000" $ nf forward t
, env (pure $ Text.replicate 1000000 "abc") $ \t -> bench "1000000" $ nf forward t
, env (pure $ Text.replicate 1000000000 "abc") $ \t -> bench "1000000000" $ nf forward t
]
, bgroup "backward"
[ env (pure $ Text.replicate 1000 "abc") $ \t -> bench "1000" $ nf backward t
, env (pure $ Text.replicate 1000000 "abc") $ \t -> bench "1000000" $ nf backward t
, env (pure $ Text.replicate 1000000000 "abc") $ \t -> bench "1000000000" $ nf backward t
]
]
benchmarking forward/1000
time 2.441 μs (2.396 μs .. 2.496 μs)
0.998 R² (0.996 R² .. 0.999 R²)
mean 2.418 μs (2.396 μs .. 2.454 μs)
std dev 94.17 ns (67.51 ns .. 130.5 ns)
variance introduced by outliers: 52% (severely inflated)
benchmarking forward/1000000
time 2.529 μs (2.482 μs .. 2.586 μs)
0.997 R² (0.995 R² .. 0.999 R²)
mean 2.520 μs (2.489 μs .. 2.582 μs)
std dev 139.6 ns (97.44 ns .. 199.7 ns)
variance introduced by outliers: 69% (severely inflated)
benchmarking forward/1000000000
time 2.466 μs (2.435 μs .. 2.501 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 2.455 μs (2.440 μs .. 2.476 μs)
std dev 59.03 ns (41.70 ns .. 77.29 ns)
variance introduced by outliers: 29% (moderately inflated)
benchmarking backward/1000
time 2.499 μs (2.449 μs .. 2.552 μs)
0.997 R² (0.996 R² .. 0.999 R²)
mean 2.505 μs (2.469 μs .. 2.574 μs)
std dev 160.0 ns (97.61 ns .. 236.9 ns)
variance introduced by outliers: 74% (severely inflated)
benchmarking backward/1000000
time 2.490 μs (2.454 μs .. 2.529 μs)
0.998 R² (0.998 R² .. 0.999 R²)
mean 2.480 μs (2.458 μs .. 2.515 μs)
std dev 91.78 ns (59.58 ns .. 151.2 ns)
variance introduced by outliers: 49% (moderately inflated)
benchmarking backward/1000000000
time 2.467 μs (2.447 μs .. 2.495 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 2.469 μs (2.453 μs .. 2.491 μs)
std dev 60.28 ns (46.57 ns .. 75.09 ns)
variance introduced by outliers: 30% (moderately inflated)
Perhaps all that I needed here was more clear documentation.
I had a brainfart moment. For a lazy reverse-into-string function, right now, the best safe solution seems to be using unsnoc
as you did. Another way is to use Data.Text.Unsafe.reverseIter
, which might be faster, or maybe not.
Ideally, I think a better solution would be provided by foldl
(not foldr
), but right now because all the folds go through stream
, they all do left-to-right traversals. My point is that we could implement a lazy foldl
as a right-to-left traversal, and that would give you lazy reverse-into-string without having to write a recursive function yourself.