gleam-lang / otp

📫 Fault tolerant multicore programs with actors

Home Page:https://hexdocs.pm/gleam_otp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature] DynamicSupervisor please

sclee15 opened this issue · comments

Hello.

I think Gleam's typed OTP and its concept of subject is great.

But, It would be better to have some types of DynamicSupervisor that allow me to spawn workers as I need.

Sounds great!

I'm also interested in this.

In terms of API design, do you see any value in having a separate supervisor and dynamic supervisor (afaict this is the case in Elixir) or should there just be a single supervisor type that happens to be dynamic, if you want it to be static you don't add more children to it. I'm personally leaning towards the option of just having a single type that is dynamic.

Also, should the supervisor still have an init function to setup initial children or do you just create a supervisor and start adding children to it dynamically?

Option 1

pub fn main() {
  let assert Ok(sup) = supervisor.start(fn(children) {
    children
    |> add(worker(database.start))
    |> add(worker(monitoring.start))
    |> add(worker(web.start))
  })

  // Something happens in between

  let assert Ok(runner) = supervisor.add_child(sup, worker(runner.start))
}

Option 2

pub fn main() {
  let assert Ok(sup) = supervisor.start()
  let assert Ok(db) = supervisor.add_child(sup, worker(database.start))
  let assert Ok(mon) = supervisor.add_child(sup, worker(monitoring.start))
  let assert Ok(web) = supervisor.add_child(sup, worker(web.start))

  // Something happens in between

  let assert Ok(runner) = supervisor.add_child(sup, worker(runner.start))
}

I see the value in not breaking the existing API but I also find option 2 to be a bit simpler API.

With option 2 how does it restart the children when one dies?

I haven't really looked into the current implementation but I'm assuming it will need to keep some kind of list of children.

Does it currently call the init function every time a child dies?

Nope, the current supervisor implements the rest_for_one strategy. I think we likely need to supervisors that implement all the different strategies, and possibly some other patterns that may be useful given Gleam OTP's lack of process naming.

Oh I see, to be honest I wasn't too familiar with all the different strategies.

I guess rest_for_one was the most logical to start with so you can control the arguments down the chain (so you don't have an old reference to a subject belonging to a process that already died)?

So some thoughts:

  • rest_for_one
    • The init function will be kept (no matter the strategy picked). This is the only place to set up the arguments.
    • All dynamic children added afterwards will be put at the back of the "chain" (as if they were just last in the init function). So they're always restarted if any child from the init function dies.
      • Possibly the dynamic children alone can be configured as one_for_one or one_for_all. So if any of the dynamic children die either only that one or all of them are restarted but the initial ones are unaffected.
  • one_for_all
    • This one can pretty much be treated as rest_for_all except everything is started from scratch, right?
  • one_for_one
    • This one is simple except for argument passing.
    • Should there be a function to set the argument for the supervisor to pass to children that can be set externally (i.e. supervisor.set_arguments(sup, MyArguments(...)))?
      • How does this affect already running children?
    • Should the supervisor maintain a registry that children can (optionally) use to register itself by name and will be passed to children?
      • This will need to send an update to all children when updated, not optimal.

Or would it be preferred to have distinct static and dynamic supervisors?

I don't think we could safely restart any dynamically added children as the initial state that was used to create them is not controlled by the supervisor.

Take a web server that does some background processing as an example. It could have a web server process, a database, connection process, and dynamically, added worker processes.

If there was to be a failure, which caused them all to be restarted, the web application and the database, connection processes would be initialise correctly, but if any of the work processes were restarted using their original initial state, they would have references to the no longer existing database connection process, and such would always fail. This would eventually result in there being too much restart intensity and the entire supervisor would fail.

I see. Given my limited experience using supervisors I might not be the best person to come up with designs for this 😅.

Would a more typical use case add the dynamic supervisor as a child of a static one so that the whole dynamic supervisor is restarted if any of its dependencies crash (web server, database, ...)?

If so, then two distinct types of supervisors might make more sense.