goldfirere / th-desugar

Desugars Template Haskell abstract syntax to a simpler format without changing semantics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve the abysmal performance of type synonym expansion

RyanGlScott opened this issue · comments

While working on some code involving singletons recently, I was mildly surprised to discover that my code looped forever. I say "mildly" since GHC has been known to take its sweet time when compiling TH-generated code, but in this particular instance, the culprit was that TH was looping forever when generating the code itself. Here is my aggressively minimized test case:

{-# LANGUAGE TemplateHaskell #-}
module Bug where

import Language.Haskell.TH
import Language.Haskell.TH.Desugar

type G = Either () ()
type F = Either G G
type E = Either F F
type D = Either E E
type C = Either D D
type B = Either C C
type A = Either B B

$(do t <- expand (DConT ''A)
     runIO $ print t
     return [])

This takes a whopping 11.5 seconds to compile. The reason this happens is because th-desugar's expand function calls reify on every type constructor it encounters in the process of type synonym function. This means that in the test case above, reify gets invoked an exponential number of times, which certainly isn't ideal.

GHC itself seems to be able to cope with this problem adequately, since if you change the test case to this:

{-# LANGUAGE TypeOperators #-}
module Works where

import Data.Type.Equality

type G = Either () ()
type F = Either G G
type E = Either F F
type D = Either E E
type C = Either D D
type B = Either C C
type A = Either B B

f :: A :~: Either (Either (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))))) (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))))) (Either (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))))) (Either (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ())))) (Either (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))) (Either (Either (Either () ()) (Either () ())) (Either (Either () ()) (Either () ()))))))
f = Refl

Then it typechecks in less than half a second. There ought to be some way we can improve expand's performance to get closer to this time. Perhaps we should memoize the results of invoking reify when expanding type synonyms to make subsequent lookups cheaper?

You conjecture that the problem is the cost of reify. First, it would be good to test that conjecture, either by timing it or somehow cheating in th-desugar (just for this test). But reify shouldn't really be all that slow. It's an env lookup that then packages its response. Here's another way to express my reaction: if memoization is th-desugar's solution, then we've discovered a GHC bug.

Could this have anything to do with external interpreters?

I hacked th-desugar to memoize the uses of reify in L.H.T.Desugar.Expand, but the time it takes to compile the program in this issue remained about the same, unfortunately.

Could this have anything to do with external interpreters?

My previous measurements were done without -fexternal-interpreter, actually. In fact, compiling this program with -fexternal-interpreter enabled makes it significantly slower to compile—it goes from 11.5 seconds to over a minute!

Thanks to Matthew PIckering's invaluable assitance, I was able to build the program above with profiling. Here are the results with just -prof:

COST CENTRE               MODULE                            SRC                                                   %time %alloc

mkQ                       Data.Generics.Aliases             src/Data/Generics/Aliases.hs:(100,1)-(102,36)          54.9   63.0
reifyWithLocals_maybe     Language.Haskell.TH.Desugar.Reify Language/Haskell/TH/Desugar/Reify.hs:(59,1)-(61,27)    19.7   14.2
dsDec                     Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(654,1)-(785,67)    8.0    6.2
concatMapM                Language.Haskell.TH.Desugar.Util  Language/Haskell/TH/Desugar/Util.hs:(306,1)-(308,19)    3.2    2.7
dsCon'                    Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(897,1)-(931,41)    2.1    2.8
dsReify                   Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:1161:1-51           2.0    1.2
initGhcM                  Main                              Main.hs:(36,1)-(73,12)                                  1.9    2.9
dsInfo                    Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(561,1)-(599,54)    1.4    0.9
mkExtraKindBinders        Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(791,1)-(792,91)    1.2    0.7
mkExtraKindBindersGeneric Language.Haskell.TH.Desugar.Util  Language/Haskell/TH/Desugar/Util.hs:(197,1)-(200,40)    1.1    1.0
dsCon                     Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(876,1)-(885,50)    1.0    1.7

And for good measure, I also compiled with -fprof-auto as well:

COST CENTRE               MODULE                            SRC                                                   %time %alloc

gfoldl                    Language.Haskell.TH.Desugar.AST   Language/Haskell/TH/Desugar/AST.hs:51:38-41            38.3   37.7
reifyWithLocals_maybe     Language.Haskell.TH.Desugar.Reify Language/Haskell/TH/Desugar/Reify.hs:(59,1)-(61,27)    20.9   14.2
everything.go             Data.Generics.Schemes             src/Data/Generics/Schemes.hs:122:5-37                  12.5   25.5
dsDec                     Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(654,1)-(785,67)    8.1    6.2
concatMapM                Language.Haskell.TH.Desugar.Util  Language/Haskell/TH/Desugar/Util.hs:(306,1)-(308,19)    3.4    2.7
mkQ                       Data.Generics.Aliases             src/Data/Generics/Aliases.hs:(100,1)-(102,36)           2.8    0.0
dsCon'                    Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(897,1)-(931,41)    2.3    2.8
initGhcM                  Main                              Main.hs:(36,1)-(73,12)                                  1.9    2.9
dsReify                   Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:1161:1-51           1.8    1.2
dsInfo                    Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(561,1)-(599,54)    1.5    0.9
mkExtraKindBindersGeneric Language.Haskell.TH.Desugar.Util  Language/Haskell/TH/Desugar/Util.hs:(197,1)-(200,40)    1.2    0.9
mkExtraKindBinders        Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(791,1)-(792,91)    1.0    0.7
dsCon                     Language.Haskell.TH.Desugar.Core  Language/Haskell/TH/Desugar/Core.hs:(876,1)-(885,50)    0.9    1.7

It looks like syb is the main culprit here. I wonder how much performance we would gain from replacing the use of syb in L.H.T.Desugar.Expand with hand-written traversals.

Hand-written traversals ending up being exactly what I needed, since implementing them reduces the compile times to less than a second on the original program in #123 (comment). See #124.