Improve the abysmal performance of type synonym expansion
RyanGlScott opened this issue · comments
While working on some code involving singletons
recently, I was mildly surprised to discover that my code looped forever. I say "mildly" since GHC has been known to take its sweet time when compiling TH-generated code, but in this particular instance, the culprit was that TH was looping forever when generating the code itself. Here is my aggressively minimized test case:
{-# LANGUAGE TemplateHaskell #-}
module Bug where
import Language.Haskell.TH
import Language.Haskell.TH.Desugar
type G = Either () ()
type F = Either G G
type E = Either F F
type D = Either E E
type C = Either D D
type B = Either C C
type A = Either B B
$(do t <- expand (DConT ''A)
runIO $ print t
return [])
This takes a whopping 11.5 seconds to compile. The reason this happens is because th-desugar
's expand
function calls reify
on every type constructor it encounters in the process of type synonym function. This means that in the test case above, reify
gets invoked an exponential number of times, which certainly isn't ideal.
GHC itself seems to be able to cope with this problem adequately, since if you change the test case to this:
{-# LANGUAGE TypeOperators #-}
module Works where
import Data.Type.Equality
type G = Either () ()
type F = Either G G
type E = Either F F
type D = Either E E
type C = Either D D
type B = Either C C
type A = Either B B
f :: A :~: Either (Either (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))))) (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))))) (Either (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))))) (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))))))
f = Refl
Then it typechecks in less than half a second. There ought to be some way we can improve expand
's performance to get closer to this time. Perhaps we should memoize the results of invoking reify
when expanding type synonyms to make subsequent lookups cheaper?
You conjecture that the problem is the cost of reify
. First, it would be good to test that conjecture, either by timing it or somehow cheating in th-desugar (just for this test). But reify
shouldn't really be all that slow. It's an env lookup that then packages its response. Here's another way to express my reaction: if memoization is th-desugar's solution, then we've discovered a GHC bug.
Could this have anything to do with external interpreters?
I hacked th-desugar
to memoize the uses of reify
in L.H.T.Desugar.Expand
, but the time it takes to compile the program in this issue remained about the same, unfortunately.
Could this have anything to do with external interpreters?
My previous measurements were done without -fexternal-interpreter
, actually. In fact, compiling this program with -fexternal-interpreter
enabled makes it significantly slower to compile—it goes from 11.5 seconds to over a minute!
Thanks to Matthew PIckering's invaluable assitance, I was able to build the program above with profiling. Here are the results with just -prof
:
COST CENTRE MODULE SRC %time %alloc
mkQ Data.Generics.Aliases src/Data/Generics/Aliases.hs:(100,1)-(102,36) 54.9 63.0
reifyWithLocals_maybe Language.Haskell.TH.Desugar.Reify Language/Haskell/TH/Desugar/Reify.hs:(59,1)-(61,27) 19.7 14.2
dsDec Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(654,1)-(785,67) 8.0 6.2
concatMapM Language.Haskell.TH.Desugar.Util Language/Haskell/TH/Desugar/Util.hs:(306,1)-(308,19) 3.2 2.7
dsCon' Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(897,1)-(931,41) 2.1 2.8
dsReify Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:1161:1-51 2.0 1.2
initGhcM Main Main.hs:(36,1)-(73,12) 1.9 2.9
dsInfo Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(561,1)-(599,54) 1.4 0.9
mkExtraKindBinders Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(791,1)-(792,91) 1.2 0.7
mkExtraKindBindersGeneric Language.Haskell.TH.Desugar.Util Language/Haskell/TH/Desugar/Util.hs:(197,1)-(200,40) 1.1 1.0
dsCon Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(876,1)-(885,50) 1.0 1.7
And for good measure, I also compiled with -fprof-auto
as well:
COST CENTRE MODULE SRC %time %alloc
gfoldl Language.Haskell.TH.Desugar.AST Language/Haskell/TH/Desugar/AST.hs:51:38-41 38.3 37.7
reifyWithLocals_maybe Language.Haskell.TH.Desugar.Reify Language/Haskell/TH/Desugar/Reify.hs:(59,1)-(61,27) 20.9 14.2
everything.go Data.Generics.Schemes src/Data/Generics/Schemes.hs:122:5-37 12.5 25.5
dsDec Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(654,1)-(785,67) 8.1 6.2
concatMapM Language.Haskell.TH.Desugar.Util Language/Haskell/TH/Desugar/Util.hs:(306,1)-(308,19) 3.4 2.7
mkQ Data.Generics.Aliases src/Data/Generics/Aliases.hs:(100,1)-(102,36) 2.8 0.0
dsCon' Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(897,1)-(931,41) 2.3 2.8
initGhcM Main Main.hs:(36,1)-(73,12) 1.9 2.9
dsReify Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:1161:1-51 1.8 1.2
dsInfo Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(561,1)-(599,54) 1.5 0.9
mkExtraKindBindersGeneric Language.Haskell.TH.Desugar.Util Language/Haskell/TH/Desugar/Util.hs:(197,1)-(200,40) 1.2 0.9
mkExtraKindBinders Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(791,1)-(792,91) 1.0 0.7
dsCon Language.Haskell.TH.Desugar.Core Language/Haskell/TH/Desugar/Core.hs:(876,1)-(885,50) 0.9 1.7
It looks like syb
is the main culprit here. I wonder how much performance we would gain from replacing the use of syb
in L.H.T.Desugar.Expand
with hand-written traversals.
Hand-written traversals ending up being exactly what I needed, since implementing them reduces the compile times to less than a second on the original program in #123 (comment). See #124.