Getting insanely high number of garbage collections
mrkkrp opened this issue · comments
Here is a benchmark I wrote for my MMark markdown processor:
module Main (main) where
import Weigh
import qualified Data.Text.IO as T
import qualified Text.MMark as MMark
main :: IO ()
main = mainWith $ do
setColumns [Case, Allocated, GCs, Max]
bparser "data/bench-paragraph.md"
----------------------------------------------------------------------------
-- Helpers
bparser
:: FilePath -- ^ File from which the input has been loaded
-> Weigh ()
bparser path = action name (p <$> T.readFile path)
where
name = "with file: " ++ path
p = MMark.parse path
When I run it, I get the following result:
Case Allocated GCs Max
with file: data/bench-paragraph.md 4,102,096 4,294,967,299 46,160
The values under the "Allocated" and "Max" columns look realistic, but "GCs" is the number of times garbage collection has happened, right? And it says GC has been run more than 4 billion times? Looks like Int
overflow or something to me.
I also benchmarked the same thing with Criterion and it shows that parsing of that paragraph takes just 758.9 μs, it's not possible that in this period of time Haskell run time managed to perform 4,294,967,299 garbage collections.
Agreed, seems like 32-bit Int underflow. 4,294,967,295
is the maximum for Word32. The fact that it shows 299
implies that the number is held in a larger size, probably an Int64
- but I bet it originated in a Word32
with the value -4
or something, and got copied into an int. E.g.
> fromIntegral ((-1) :: Word32) + 4 :: Int64
4294967299
Can you give me your stack resolver so that I can reproduce this precisely?
Also, I presume you're on a 64-bit platform?
Yes 64 bit. Here is the branch with benchmarks, and stack.yaml
there should be enough for precise reproducing:
Yep, that's it.
GCStats has numGcs
as a Int64 in the base
I wrote against: http://hackage.haskell.org/package/base-4.10.0.0/docs/GHC-Stats.html#t:GCStats
But which is deprecated in GHC 8.2 in favor of RTSStats
and I see that gcs
is now a Word32
. So I imagine there's a mistranslation from their new API for backwards-compatibility. The values that come back are: (8589934597,4294967297) for GCs. So it seems to be a borked API.
Not to worry, I'll have to put in some CPP to detect this version of things and use the RTSStats
if available.
@mrkkrp Fixed in weigh-0.0.6
.