kquick / Thespian

Python Actor concurrency library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OOM-Killer reaping some top-level actors

clairdelaluna opened this issue · comments

Hi Kevin,

I've noticed that if the OOM-Killer gets invoked and kills an actor managed by the MultiProcAdmin, they are not restarted and this can cause problems. What is curious is that even if the actor has child actors it creates the children are not killed (but they do not do anything as they depend on messages from the parent to run). Is this known behaviour? If so, is there a best practice for handling this? I was thinking of creating a supervising top-level actor that monitors the existence of the other actors and restarts them if it notices they are missing.

Thank you!

@clairdelaluna

First, let me apologize for taking so long to respond. Apparently github's notifications page doesn't show me when I have a new issue posted to one of my repositories.... argh. This has been an issue in the past and I've been trying various tweaks, but even "Watching" my own repositories doesn't seem to help.

As to the problem, the MultiProcAdmin provides administrative handling for the internals of Thespian, but it is not a proper "Actor" by itself (see https://thespianpy.com/doc/using.html#hH-91e3dc4e-af22-4076-94ee-43f60458f0e0). Because there may be conditions where a top-level Actor should not be restarted, or where the restart requires additional initialization messages, the MultiProcAdmin will not perform restarts of these top-level actors.

The children are unaffected because it would be the parent actor that would generate/propagate a normal ActorExitRequest, but the OOM is a rather drastic intervention in the process that does not even provide a normal shutdown signal, so the process is simply gone.

Your approach is the one I would recommend: keep the top-level actors very simple, with their primary task just being to monitor your main actor(s) and restart them when appropriate, moving all the principle functionality into the actors managed by the top-level actor.