awkward-squad / ki

A structured concurrency library

Home Page:https://hackage.haskell.org/package/ki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nested scope stuck on Data.ByteString.hGetLine

TristanCacqueray opened this issue · comments

Hello, it looks like a scope may not terminate properly when reading the output of an external process using hGetLine.

When running the following example I get:

$ cabal run ki-blocked-repro.hs
Cat is running...
Cat is running...
Exiting main scope...
  C-c C-ccabal-script-ki-blocked-repro.hs: thread blocked indefinitely in an STM transaction

The program hangs until it is interrupted with Ctrl-C.

When trying to minify the reproducer, I noticed that removing the nested scope, by replacing Ki.scoped catAction with catAction scope, the issue disappear: the program exit cleanly. Thus it seems like a bug in ki, but I can't tell the root cause, and perhaps it's related to the process library and how the exceptions are masked?

{- cabal:
build-depends: base, bytestring, ki, typed-process
ghc-options: -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
-}
module Main where

import Control.Concurrent (threadDelay)
import Control.Monad (forever)
import qualified Data.ByteString as BS
import qualified Ki as Ki
import qualified System.Process.Typed as ProcessTyped

catAction :: Ki.Scope -> IO ()
catAction scope =
    ProcessTyped.withProcessWait_ cmd $ \p -> do
        Ki.fork_ scope $ forever $ BS.hGetLine (ProcessTyped.getStdout p)
        forever $ do
            putStrLn $ "Cat is running..."
            threadDelay 300000
  where
    cmd =
        ProcessTyped.setStdout ProcessTyped.createPipe $
            ProcessTyped.proc "cat" []

main :: IO ()
main = do
    Ki.scoped $ \scope -> do
        _ <- scope `Ki.fork` Ki.scoped catAction
        threadDelay 500000
        putStrLn "Exiting main scope..."
    putStrLn "This is the end."

Thanks for the report, I'll look into this

It's a real bug :)

We have a PR up here with some details: #20

Once we confirm your repro runs without failing, we'll get a patch version release up on Hackage. Versions 1.0.0 and 1.0.0.1 are affected, so I'll mark those as deprecated.

Thank you for the bug report and working repro!

Here's the behavior I'm seeing on that branch:

Cat is running...
Cat is running...
Exiting main scope...
<I press Enter>
kibug19: fd:11: Data.ByteString.hGetLine: illegal operation (handle is closed)

That's certainly a step up from

Cat is running...
Cat is running...
Exiting main scope...
<I press Enter>
kibug19: thread blocked indefinitely in an STM transaction

However, I'm not certain what's going on with the <I press Enter> step. My instinct is that I shouldn't have to press Enter here: we deliver an asynchronous exception to the thread that's running withProcessWait_, and it should die promptly.

It's possible a bug lies somewhere in typed-process cleanup code.

I modified your repro to use process instead:

{- cabal:
build-depends: base, bytestring, ki, process
ghc-options: -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
-}
module Main where

import Control.Concurrent (threadDelay)
import Prelude
import Control.Monad (forever)
import qualified Data.ByteString as BS
import qualified Ki as Ki
import qualified System.Process as Process

catAction :: Ki.Scope -> IO ()
catAction scope =
    Process.withCreateProcess cmd $ \_ (Just stdout) _ process -> do
        Ki.fork_ scope $ forever $ BS.hGetLine stdout
        forever $ do
            putStrLn $ "Cat is running..."
            threadDelay 300000
  where
    cmd =
        (Process.proc "cat" []) {Process.std_out = Process.CreatePipe}

main :: IO ()
main = do
    Ki.scoped $ \scope -> do
        _ <- scope `Ki.fork` Ki.scoped catAction
        threadDelay 500000
        putStrLn "Exiting main scope..."
    putStrLn "This is the end."

And it exits without me having to press Enter (which is also why there's no handle is closed exception: hGetLine never reads anything):

Cat is running...
Cat is running...
Exiting main scope...
This is the end.

Thank you for looking into that so quickly, I really appreciate. It was not obvious what was the issue since I built a supervision tree layer on top of ki and I'm glad the simple reproducer is helpful.

Though I'm not sure the proposed fix is enough, I got the same result (handle is closed instead of thread blocked indefinitely in an STM transaction) with typed-process. And the process version works correctly without #20.

It looks like this can be related to the issue described in fpco/typed-process#38 (comment) . But I wonder why the bug does not happen when only a single scope is at play, e.g. this works with or without #20:

{- cabal:
build-depends: base, bytestring, ki, typed-process
ghc-options: -threaded -rtsopts -with-rtsopts=-N -with-rtsopts=-T
-}
module Main where

import Control.Concurrent (threadDelay)
import Control.Monad (forever)
import qualified Data.ByteString as BS
import qualified Ki as Ki
import qualified System.Process.Typed as ProcessTyped

catAction :: Ki.Scope -> IO ()
catAction scope =
    ProcessTyped.withProcessWait_ cmd $ \p -> do
	Ki.fork_ scope $ forever $ BS.hGetLine (ProcessTyped.getStdout p)
	forever $ do
            putStrLn $ "Cat is running..."
            threadDelay 300000
  where
    cmd =
        ProcessTyped.setStdout ProcessTyped.createPipe $
            ProcessTyped.proc "cat" []

main :: IO ()
main = do
    Ki.scoped $ \scope -> do
	_ <- scope `Ki.fork` catAction scope
	threadDelay 500000
	putStrLn "Exiting main scope..."
    putStrLn "This is the end."

Of course!

Though I'm not sure the proposed fix is enough, I got the same result (handle is closed instead of thread blocked indefinitely in an STM transaction) with typed-process.

I believe this is expected. Despite being syntactically nested within the scope of the cat process, the background thread that is running hGetLine on its stdout indeed outlives it, and is only killed by its parent scope after withProcessWait_ returns.

That's just a partial answer, though: we will look into that typed-process issue and see if we can piece together a satisfying explanation for the difference between process and typed-process.

Looking into typed-process we see that this is a bug in the example program rather than a bug in Ki:

Ki throws a ScopeClosing exception to the thread that called withProcessWait_, and withProcessWait_ enters an exception handler that is meant to cleanup after itself (close handles, kill the forked process, etc). The handler first attempts to close the stdout handle with hClose, which attempts to acquire the MVar lock within Handle. The lock is currently held by hGetLine, and hGetLine does not yield this lock until it reads a newline character, so hClose is blocked, which in turn blocks propagating the ScopeClosing exception.

Pressing enter allows hGetLine to return which unblocks withProcessWait_'s exception handler. At this point the handle is closed and the withProcessWait_ thread will soon propagate the ScopeClosing exception to the hGetLine thread. There is a race here: if the hGetLine thread calls hGetLine before receiving the ScopeClosing exception then we will see a handle is closed exception, otherwise we will see nothing.

The "hGetLine blocks hClose" behavior can be observed in the following example:

{- cabal:
build-depends: base, process, typed-process
ghc-options: -threaded -rtsopts -with-rtsopts=-N
-}

import Control.Concurrent
import Control.Monad
import System.IO (hClose, hGetLine)
import System.Process qualified as Process

main :: IO ()
main = do
  (_, Just stdout, _, _) <- Process.createProcess (Process.proc "cat" []) {Process.std_out = Process.CreatePipe}
  forkIO (void (hGetLine stdout))
  threadDelay 1000000
  hClose stdout

process is different in that withCreateProcess's exception handler first calls terminateProcess which causes hGetLine to throw an EOF exception and yield the mvar lock (source). See also the warning in that linked source that forked threads reading these handles should be killed off to avoid this hClose deadlock.

Let us know if there's anything else to address @TristanCacqueray!

If I understand correctly, without the intermediary scope, the thread that calls hGetLine is somehow killed before the thread running withProcessWait, and that results in the correct behavior. But when using an extra scope, then the kill order is inverted, resulting in a withProcessWait deadlock. This revealed an issue fixed by #20, where the scoped function should have been waiting on the mvar based deadlock, instead of introducing its own stm based deadlock.

I guess it's better to use withCreateProcess to avoid this situation. Thanks again for the super fast feedback, that's amazing :-)

If I understand correctly, without the intermediary scope, the thread that calls hGetLine is somehow killed before the thread running withProcessWait, and that results in the correct behavior. But when using an extra scope, then the kill order is inverted, resulting in a withProcessWait deadlock.

It isn't that the kill order is inverted, but rather than without the intermediary scope both threads (the withProcessWait thread and the hGetLinethread) are forked from the same scope. When the outermost scope is closing it throws ScopeClosing to its immediate children. The hGetLine thread handles this exception by releasing the mvar and rethrowing, unblocking the withProcessWait thread that calls hClose.

When the intermediary scope is installed there is only one immediate child, the thread calling withProcessWait, and it will propagate ScopeClosing downward to its immediate child (the hGetLine thread) once it hits the exception handler installed by Ki. This won't happen until after the exception handler installed by withProcessWait finishes and rethrows ScopeClosing.

The bug in Ki that was revealed by this issue (although not triggered) is that if a child thread handles a ScopeClosing exception by throwing a different exception a deadlock is created. Details and a test are in #20 if you are interested.