Gabriella439 / turtle

Shell programming, Haskell style

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible memory leak

GregorySchwartz opened this issue · comments

Not sure where, but this has a stack overflow without optimizations, and with optimizations it grows in memory:

reduce Fold.length . toLines . TB.toUTF8 . TB.decompress (WindowBits 31) . TB.input $ "file.txt.gz"

I would think it is streaming and should just need constant memory, right? Looking at the profiling, it looks like the issue is between toLines . TB.toUTF8 . TB.decompress (WindowBits 31).

Attached is a partial heap profile for the above code.
test

Actually, it happens with just reduce Fold.length . input $ "file.txt". Am I misunderstanding the streaming / strict fold of foldl and turtle?

If I use Fold.fold on Fold.fold Fold.minimum . lines <$> readFile "test.tsv" there seems to be no real issue (not a solid horizontal line but a jagged one, but it's still horizontal so no real growth). Is there an issue with Shell's fold or reduce? Maybe it's instance?

Also, reduce Fold.minimum $ select [1..1000000000] has a linear increase in memory use, while Fold.fold Fold.minimum [1..1000000000] does not. It must be a Shell issue.

@GregorySchwartz: Yeah, Turtle's fold should run in constant space for these examples. I'm still looking into this

@Gabriel439 Is there anything I can do to help with this issue? Turtle has a critical role in my programs and this high memory usage / potential slowness will affect benchmarks. I'm not too familiar with the internals of Shell is the only problem...

@GregorySchwartz: I think I know what the problem is. The big clue is that turtle leaks but foldl does not, and I suspect the reason why is because turtle's fold uses translate under the hood which introduces a space leak:

translate :: FoldM IO a b -> FoldShell a b
translate (FoldM step begin done) = FoldShell step' Nothing done'
  where
    step' Nothing a = do
        x  <- begin
        x' <- step x a
        return (Just x')
    step' (Just x) a = do
        x' <- step x a
        return (Just x')

    done' Nothing = do
        x <- begin
        done x
    done' (Just x) = do
        done x

Specifically, translate should be using a strict Maybe instead, like the one in Control.Foldl.Internal:

https://hackage.haskell.org/package/foldl-1.4.6/docs/src/Control.Foldl.Internal.html#Just%27

@GregorySchwartz: The fix is up here: #381

That fixed it, thank you!

@GregorySchwartz: You're welcome! 🙂