ndmitchell / spaceleak

Notes on space leaks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spaceleak detection

Every large Haskell program almost inevitably contains space leaks. Space leaks are often difficult to detect, but relatively easy to fix once detected (typically insert a !). This page gives a simple technique to detect some space leaks, along with a set of blog posts using that technique and detailing other space leaks.

Spaceleak stack-limiting technique

The stack-limiting technique is useful for detecting a common kind of spaceleak, a situation where an excessive number of interdependent thunks are created and eventually evaluated. The technique works because thunks are evaluated on the stack, so by limiting the stack size we effectively limit the number of interdependent thunks that we can evaluate.

To find space leaks, given a program and a representative run (e.g. the test suite, a suitable input file):

  • Compile the program for profiling, e.g. ghc --make Main.hs -rtsopts -prof -fprof-auto.
  • Run the program with a specific stack size, e.g. ./Main +RTS -K100K to run with a 100Kb stack.
  • Increase/decrease the stack size until you have determined the minimum stack for which the program succeeds, e.g. -K33K.
  • Reduce the stack by a small amount and rerun with -xc, e.g. ./Main +RTS -K32K -xc.
  • The -xc run will print out the stack trace on every exception, look for the one which says stack overflow (likely the last one) and look at the stack trace to determine roughly where the leak is.
  • Attempt to fix the space leak, confirm by rerunning with -K32K.
  • Repeat until the test works with a small stack, typically -K1K.
  • Add something to your test suite to ensure that if a space leak is ever introduced then it fails, e.g. ghc-options: -with-rtsopts=-K1K in Cabal.

This technique does not detect when an excessive number of thunks are created but never evaluated, or when a small number of thunks hold on to a large amount of live data.

Note that typically the main thread requires greater than 1K stack, and that once GHC crosses 1K it makes the stack 32K bigger, as a result anything less than 33K (e.g. 1K) is often rounded up to 33K. To obtain a more precise stack measurement use -kc2k which increases the stack in 2K chunks (but does cause only half the stack to be used, so bear that in mind when scaling -K numbers). That said, 32K corresponds to approximately 1000 stack evaluation slots, which suggests you don't have any significant space leaks.

Below are links to further information, including lots of practical examples of real space leaks caught using the method above.

Talks

Blog tales

Fixes

Code changes

Publications

Other approaches

Notes

On the main thread the stack limit is less effective, usually more like 8K if you request 1K. On spawned threads it seems much better. Solution is to always join . onceFork on the main thread.

About

Notes on space leaks


Languages

Language:Haskell 100.0%