The only RoleTask does not exit after completion

Question

The only RoleTask does not exit after completion

amricko0b opened this issue 3 years ago · comments

Greetings!

I've got a Cats IO based application with only RoleTask bootstrapped. Distage version: 1.0.8
My RoleTask contains long, blocking piece of code - but it doesn't blocks forever.

I thought, that after completion of role's start() method the application must exit - but it doesn't.
Am I wrong, assuming this behaviour?

My role looks something like this:

class MyTaskRole(
    sender: MySender[IO],
    @Id("myTask-role-logger") logger: Logger[IO]
) extends RoleTask[IO] {

  override def start(
      roleParameters: RawEntrypointParams,
      freeArgs: Vector[String]
  ): IO[Unit] =
    for {
      _ <- logger.info("Running sending task")
      _ <- sender.sendBatchesUntilEmpty
      _ <- logger.info("Sending task complete!")
    } yield ()
}

object MyTaskRole extends RoleDescriptor {
  override def id: String = "myTaskRole"

  object Plugin extends PluginDef {
    include(
      new RoleModuleDef {
        makeRole[MyTaskRole]
      }
    )
  }
}

sender.sendBatchesUntilEmpty is a blocking call, performing a long, blocking task.
After its completion I see "Sending task complete!" message among logs.
Also, I observe that line:

I 2021-08-04T17:48:26.407 (RoleAppEntrypoint.scala:82) …AppEntrypoint.Impl.runRoles [31:ioapp-compute-0] phase=late No services to run, exiting...

But, as I have already mentioned, application does not exit.

Here is example of my launcher - maybe it helps somehow.
I had not override anything here.

object Launcher extends LauncherCats[IO] {

  implicit val ec: ExecutionContextExecutor =
    ExecutionContext.fromExecutor(
      Executors.newFixedThreadPool(10)
    )

  implicit val contextShiftIO: ContextShift[IO] = IO.contextShift(ec)

  implicit val concurrentEffectIO: ConcurrentEffect[IO] = IO.ioConcurrentEffect

  implicit val timerIO: Timer[IO] = IO.timer(ec)

  override protected def pluginConfig: PluginConfig =
    PluginConfig.const(
      Seq(
        MyTaskRole.Plugin,
        DI
      )
    )

  object DI extends PluginDef {
    // a bunch of include(...) directives
  }
}

Paul S. · Answer 1 · Thu Aug 05 2021 05:45:59 GMT+0800 (China Standard Time)

Sounds like a bug. I'll look at it. Thanks.

Paul S. · Answer 2 · Fri Aug 20 2021 02:20:16 GMT+0800 (China Standard Time)

I've tried to reproduce this in 634aea5 though the demo app works exactly as expected.

Could you check if your app works on latest snapshot? In case it doesn't - could you fix my repro and actually reproduce the issue please?

Maxxie · Answer 3 · Fri Aug 20 2021 17:26:35 GMT+0800 (China Standard Time)

Many thanks for your attention, @pshirshov!

I have checked your reproduction, it works as expected for me as well.
Though, switching my app to 1.0.9-SNAPSHOT did not help, it led me to some sort of explanation.

You see, in your reproduction Task has no dependencies.
But actual task in my app has (that's my fault - I've not mentioned that fact, sorry) MySender which, in his own turn, has KafkaProducer (from fs2-kafka) as a dependency, that is utilized for actual sending. And this producer seems to be to blame.

So, the complete example Task looks like this. Here I use KafkaProducer directly for clarity:

class MyTaskRole(
    producer: KafkaProducer[IO, String, String]
    @Id("myTask-role-logger") logger: Logger[IO]
) extends RoleTask[IO] {

  override def start(
      roleParameters: RawEntrypointParams,
      freeArgs: Vector[String]
  ): IO[Unit] =
    for {
      _ <- logger.info("Running sending task")
      // _ <- producer.produce(...) we can skip this call at all, it does not change the behaviour
      _ <- logger.info("Sending task complete!")
    } yield ()
}

Producer declared as follows:

object DI extends PluginDef {
    // ...

    include(
      new ModuleDef {
        make[KafkaProducer[IO, String, String]].fromResource {
          KafkaProducer[IO].resource(
            ProducerSettings(
              keySerializer = Serializer.string[IO],
              valueSerializer = Serializer.string[IO]
            ).withBootstrapServers("localhost:9092")
              .withCloseTimeout(2 seconds)
          )
        }
   )

   // ...
}

If I exclude kafka producer from Task dependencies (just remove field), it works as expected - role exits after completion.

Seems like the problem is hidden in KafkaProducer's resource (distage's .fromResource directive works fine for other long-living resources) or in my misunderstanding of how to use it.
So, it could be a bug, but it's not related with Distage (probably with fs2-kafka, but I'm not sure - I'l proceed my investigation).

Hence, I think, this issue can be closed for now.
Thanks for your help and sorry again for incomplete details.

Paul S. · Answer 4 · Sat Aug 21 2021 00:04:56 GMT+0800 (China Standard Time)

Could you create a separate demo project with a repro please? In any case this shouldn't happen, though we would be happy to investigate. In case it's a bug in fs2 or cats - we'll fix it anyway :)

Maxxie · Answer 5 · Mon Sep 06 2021 18:47:47 GMT+0800 (China Standard Time)

Sorry for that delay - I had to have some time dedicated to work :(

I did a little demo project with reproduction.
It could be found here.

I should also notice, that the problem disappears, when I switch from custom ExecutionContext (based on FIxedThreadPool) to global.

Kai · Answer 6 · Mon Sep 06 2021 19:56:53 GMT+0800 (China Standard Time)

@amricko0b
Thanks for the reproduction!
The problem in this demo is caused by creating a global – and non-daemon, so it prevents the VM from closing – execution context which is then never closed.

Solution: https://github.com/amricko0b/distage-1557/pull/1
In detail, this causes the VM to stuck and is generally preventable by either making thread pools daemonic using a custom ThreadFactory:

import scala.util.chaining.scalaUtilChainingOps

Executors.newFixedThreadPool(16, new ThreadFactory {
  def newThread(r: Runnable): Thread = new Thread(r).tap(_.setDaemon(true))
})

Or by closing the ExecutionContext properly using its .shutdown() method. Distage provides a Lifecycle.fromExecutorService helper for that:

make[ExecutionContext].fromResource {
  Lifecycle
    .fromExecutorService(Executors.newFixedThreadPool(16))
    .map(ExecutionContext.fromExecutor(_))
}

Also I think this problem be avoided in the future with this change #1585 - we already supply default bindings for ExecutionContext, but only for ZIO right now, I fixed that oversight in the PR so you may use a default ExecutionContext @Id("cpu") without having to make a custom one.

Maxxie · Answer 7 · Tue Sep 07 2021 01:14:22 GMT+0800 (China Standard Time)

Oh, now I understand. Thanks, for explanation.
My investigation took me very deep in details, but the answer was much more simple :D

By the way, #1585 - great idea, will wait for that feature, but for now I'll try the second option with helper)

I suppose this issue can be closed now.
Again, thanks for your answers and patience)