fork is too strict
simonmar opened this issue · comments
This simple program does not run in parallel:
import Control.Monad.Par
fib :: Integer -> Integer
fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
main =
args <- getArgs
let [n,m] = map read args
print $ runPar $ do
i <- new
j <- new
fork (put i (fib n))
fork (put j (fib m))
a <- get i
b <- get j
return (a+b)
It isn't the old problem of fib
not allocating: note that I used the Integer
version that allocates.
No, the problem is that in the direct scheduler, fork
is too strict: the work item is evaluated when put into the work pool (by pushL
, which causes the fib
call to be evaluated in the parent. It doesn't happen with spawnP
, but I think that's only because spawnP
isn't inlined enough to expose the strictness.
I've no problem with put
being strict, but I thing fork
being strict will cause confusion.
Maybe strictly speaking the inputs should be passed via IVars. That's defensible, but I think it would be nice if we didn't have to do that all the time. Making simple examples like the above work properly will avoid confusing users (it confused me, and I supposedly know what I'm doing :-).
I think there are basically two problems:
As you noted, the deque we use is strict, which forces IO
actions to WHNF. That's a bit peculiar as it is, and I don't like it. Furthermore, said deque is intended only as a reference implementation. It's based on Data.Sequence
, which is definitely overkill for this job: we don't need to be able to index into the deque or split it. Indeed, we only need an input-restricted deque: we never use pushR
.
The other problem is that put
and newFull
perform the deepseq
on the wrong side. It would seem more reasonable to use something more like this:
-- A version of put_ that doesn't force at all
put__ :: ...
-- This gets eta-expanded automatically
put_ iv !a = put__ iv a
-- We need to help GHC a bit with this one
put iv a = liftIO (a `deepseq` return ()) >> put__ iv a