Nested scope stuck on Data.ByteString.hGetLine
TristanCacqueray opened this issue · comments
Hello, it looks like a scope may not terminate properly when reading the output of an external process using hGetLine
.
When running the following example I get:
$ cabal run ki-blocked-repro.hs
Cat is running...
Cat is running...
Exiting main scope...
C-c C-ccabal-script-ki-blocked-repro.hs: thread blocked indefinitely in an STM transaction
The program hangs until it is interrupted with Ctrl-C.
When trying to minify the reproducer, I noticed that removing the nested scope, by replacing Ki.scoped catAction
with catAction scope
, the issue disappear: the program exit cleanly. Thus it seems like a bug in ki
, but I can't tell the root cause, and perhaps it's related to the process library and how the exceptions are masked?
{- cabal:
build-depends: base, bytestring, ki, typed-process
ghc-options: -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
-}
module Main where
import Control.Concurrent (threadDelay)
import Control.Monad (forever)
import qualified Data.ByteString as BS
import qualified Ki as Ki
import qualified System.Process.Typed as ProcessTyped
catAction :: Ki.Scope -> IO ()
catAction scope =
ProcessTyped.withProcessWait_ cmd $ \p -> do
Ki.fork_ scope $ forever $ BS.hGetLine (ProcessTyped.getStdout p)
forever $ do
putStrLn $ "Cat is running..."
threadDelay 300000
where
cmd =
ProcessTyped.setStdout ProcessTyped.createPipe $
ProcessTyped.proc "cat" []
main :: IO ()
main = do
Ki.scoped $ \scope -> do
_ <- scope `Ki.fork` Ki.scoped catAction
threadDelay 500000
putStrLn "Exiting main scope..."
putStrLn "This is the end."
Thanks for the report, I'll look into this
It's a real bug :)
We have a PR up here with some details: #20
Once we confirm your repro runs without failing, we'll get a patch version release up on Hackage. Versions 1.0.0
and 1.0.0.1
are affected, so I'll mark those as deprecated.
Thank you for the bug report and working repro!
Here's the behavior I'm seeing on that branch:
Cat is running...
Cat is running...
Exiting main scope...
<I press Enter>
kibug19: fd:11: Data.ByteString.hGetLine: illegal operation (handle is closed)
That's certainly a step up from
Cat is running...
Cat is running...
Exiting main scope...
<I press Enter>
kibug19: thread blocked indefinitely in an STM transaction
However, I'm not certain what's going on with the <I press Enter>
step. My instinct is that I shouldn't have to press Enter here: we deliver an asynchronous exception to the thread that's running withProcessWait_
, and it should die promptly.
It's possible a bug lies somewhere in typed-process
cleanup code.
I modified your repro to use process
instead:
{- cabal:
build-depends: base, bytestring, ki, process
ghc-options: -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
-}
module Main where
import Control.Concurrent (threadDelay)
import Prelude
import Control.Monad (forever)
import qualified Data.ByteString as BS
import qualified Ki as Ki
import qualified System.Process as Process
catAction :: Ki.Scope -> IO ()
catAction scope =
Process.withCreateProcess cmd $ \_ (Just stdout) _ process -> do
Ki.fork_ scope $ forever $ BS.hGetLine stdout
forever $ do
putStrLn $ "Cat is running..."
threadDelay 300000
where
cmd =
(Process.proc "cat" []) {Process.std_out = Process.CreatePipe}
main :: IO ()
main = do
Ki.scoped $ \scope -> do
_ <- scope `Ki.fork` Ki.scoped catAction
threadDelay 500000
putStrLn "Exiting main scope..."
putStrLn "This is the end."
And it exits without me having to press Enter (which is also why there's no handle is closed
exception: hGetLine
never reads anything):
Cat is running...
Cat is running...
Exiting main scope...
This is the end.
Thank you for looking into that so quickly, I really appreciate. It was not obvious what was the issue since I built a supervision tree layer on top of ki and I'm glad the simple reproducer is helpful.
Though I'm not sure the proposed fix is enough, I got the same result (handle is closed
instead of thread blocked indefinitely in an STM transaction
) with typed-process
. And the process
version works correctly without #20.
It looks like this can be related to the issue described in fpco/typed-process#38 (comment) . But I wonder why the bug does not happen when only a single scope is at play, e.g. this works with or without #20:
{- cabal:
build-depends: base, bytestring, ki, typed-process
ghc-options: -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
-}
module Main where
import Control.Concurrent (threadDelay)
import Control.Monad (forever)
import qualified Data.ByteString as BS
import qualified Ki as Ki
import qualified System.Process.Typed as ProcessTyped
catAction :: Ki.Scope -> IO ()
catAction scope =
ProcessTyped.withProcessWait_ cmd $ \p -> do
Ki.fork_ scope $ forever $ BS.hGetLine (ProcessTyped.getStdout p)
forever $ do
putStrLn $ "Cat is running..."
threadDelay 300000
where
cmd =
ProcessTyped.setStdout ProcessTyped.createPipe $
ProcessTyped.proc "cat" []
main :: IO ()
main = do
Ki.scoped $ \scope -> do
_ <- scope `Ki.fork` catAction scope
threadDelay 500000
putStrLn "Exiting main scope..."
putStrLn "This is the end."
Of course!
Though I'm not sure the proposed fix is enough, I got the same result (handle is closed instead of thread blocked indefinitely in an STM transaction) with typed-process.
I believe this is expected. Despite being syntactically nested within the scope of the cat
process, the background thread that is running hGetLine
on its stdout indeed outlives it, and is only killed by its parent scope after withProcessWait_
returns.
That's just a partial answer, though: we will look into that typed-process
issue and see if we can piece together a satisfying explanation for the difference between process
and typed-process
.
Looking into typed-process
we see that this is a bug in the example program rather than a bug in Ki
:
Ki
throws a ScopeClosing
exception to the thread that called withProcessWait_
, and withProcessWait_
enters an exception handler that is meant to cleanup after itself (close handles, kill the forked process, etc). The handler first attempts to close the stdout handle with hClose
, which attempts to acquire the MVar
lock within Handle
. The lock is currently held by hGetLine
, and hGetLine
does not yield this lock until it reads a newline character, so hClose
is blocked, which in turn blocks propagating the ScopeClosing
exception.
Pressing enter allows hGetLine
to return which unblocks withProcessWait_
's exception handler. At this point the handle is closed and the withProcessWait_
thread will soon propagate the ScopeClosing
exception to the hGetLine
thread. There is a race here: if the hGetLine
thread calls hGetLine
before receiving the ScopeClosing
exception then we will see a handle is closed
exception, otherwise we will see nothing.
The "hGetLine
blocks hClose
" behavior can be observed in the following example:
{- cabal:
build-depends: base, process, typed-process
ghc-options: -threaded -rtsopts -with-rtsopts=-N
-}
import Control.Concurrent
import Control.Monad
import System.IO (hClose, hGetLine)
import System.Process qualified as Process
main :: IO ()
main = do
(_, Just stdout, _, _) <- Process.createProcess (Process.proc "cat" []) {Process.std_out = Process.CreatePipe}
forkIO (void (hGetLine stdout))
threadDelay 1000000
hClose stdout
process
is different in that withCreateProcess
's exception handler first calls terminateProcess
which causes hGetLine
to throw an EOF exception and yield the mvar lock (source). See also the warning in that linked source that forked threads reading these handles should be killed off to avoid this hClose
deadlock.
Let us know if there's anything else to address @TristanCacqueray!
If I understand correctly, without the intermediary scope, the thread that calls hGetLine
is somehow killed before the thread running withProcessWait
, and that results in the correct behavior. But when using an extra scope, then the kill order is inverted, resulting in a withProcessWait
deadlock. This revealed an issue fixed by #20, where the scoped
function should have been waiting on the mvar based deadlock, instead of introducing its own stm based deadlock.
I guess it's better to use withCreateProcess
to avoid this situation. Thanks again for the super fast feedback, that's amazing :-)
If I understand correctly, without the intermediary scope, the thread that calls hGetLine is somehow killed before the thread running withProcessWait, and that results in the correct behavior. But when using an extra scope, then the kill order is inverted, resulting in a withProcessWait deadlock.
It isn't that the kill order is inverted, but rather than without the intermediary scope both threads (the withProcessWait
thread and the hGetLine
thread) are forked from the same scope. When the outermost scope is closing it throws ScopeClosing
to its immediate children. The hGetLine
thread handles this exception by releasing the mvar and rethrowing, unblocking the withProcessWait
thread that calls hClose
.
When the intermediary scope is installed there is only one immediate child, the thread calling withProcessWait
, and it will propagate ScopeClosing
downward to its immediate child (the hGetLine
thread) once it hits the exception handler installed by Ki
. This won't happen until after the exception handler installed by withProcessWait
finishes and rethrows ScopeClosing
.
The bug in Ki
that was revealed by this issue (although not triggered) is that if a child thread handles a ScopeClosing
exception by throwing a different exception a deadlock is created. Details and a test are in #20 if you are interested.