fpco / weigh

Measure allocations of a Haskell functions/values

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting insanely high number of garbage collections

mrkkrp opened this issue · comments

Here is a benchmark I wrote for my MMark markdown processor:

module Main (main) where

import Weigh
import qualified Data.Text.IO as T
import qualified Text.MMark   as MMark

main :: IO ()
main = mainWith $ do
  setColumns [Case, Allocated, GCs, Max]
  bparser "data/bench-paragraph.md"

----------------------------------------------------------------------------
-- Helpers

bparser
  :: FilePath          -- ^ File from which the input has been loaded
  -> Weigh ()
bparser path = action name (p <$> T.readFile path)
  where
    name = "with file: " ++ path
    p    = MMark.parse path

When I run it, I get the following result:

Case                                Allocated            GCs     Max
with file: data/bench-paragraph.md  4,102,096  4,294,967,299  46,160

The values under the "Allocated" and "Max" columns look realistic, but "GCs" is the number of times garbage collection has happened, right? And it says GC has been run more than 4 billion times? Looks like Int overflow or something to me.

I also benchmarked the same thing with Criterion and it shows that parsing of that paragraph takes just 758.9 μs, it's not possible that in this period of time Haskell run time managed to perform 4,294,967,299 garbage collections.

Agreed, seems like 32-bit Int underflow. 4,294,967,295 is the maximum for Word32. The fact that it shows 299 implies that the number is held in a larger size, probably an Int64 - but I bet it originated in a Word32 with the value -4 or something, and got copied into an int. E.g.

> fromIntegral ((-1) :: Word32) + 4 :: Int64
4294967299

Can you give me your stack resolver so that I can reproduce this precisely?

Also, I presume you're on a 64-bit platform?

Yes 64 bit. Here is the branch with benchmarks, and stack.yaml there should be enough for precise reproducing:

https://github.com/mrkkrp/mmark/tree/add-benchmarks

Yep, that's it.

GCStats has numGcs as a Int64 in the base I wrote against: http://hackage.haskell.org/package/base-4.10.0.0/docs/GHC-Stats.html#t:GCStats

But which is deprecated in GHC 8.2 in favor of RTSStats and I see that gcs is now a Word32. So I imagine there's a mistranslation from their new API for backwards-compatibility. The values that come back are: (8589934597,4294967297) for GCs. So it seems to be a borked API.

Not to worry, I'll have to put in some CPP to detect this version of things and use the RTSStats if available.

@mrkkrp Fixed in weigh-0.0.6.