ostinelli / syn

A scalable global Process Registry and Process Group manager for Erlang and Elixir.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add a timeout option to multi_call

essen opened this issue · comments

Would be good to have the ability to configure the timeout value for syn:multi_call. For example multi_call/3 with an added Timeout argument. 5000 is a good default but it doesn't fit all use cases.

Additionally it might be worth to monitor to exit early if a crash happened. I think this functionality should try to mirror the gen:call mechanism as much as possible, otherwise it can be dangerous.

Very nice suggestion. On top of the configurable timeout, do you actually suggest to monitor all of the recipient of the multi_call, and add the ones that eventually crash to the list of the BadPids?

Yes. The timeout is great as an upper limit if the process is too busy or locked, but if it crashed or the node is down we definitely want to know sooner, especially for larger timeout values.

Ok. Unless someone enters a PR for this I will be on it asap.

I'm not sure my customer will settle on syn yet so that's all the time I can dedicate right now. :-) But I will check again later on. Thanks for considering it.

@essen, this should conclude the enhancement requested. This implementation:

  • Allows to pass a custom Timeout value.
  • Monitors recipient processes so that if they exit they are immediately added to the list of bad pids.

Let me know if this works for you and I'll package a new Syn.

Now that you have a reference (the monitor) the next logical thing to do would be to enable the selective receive optimization (when a reference is created in the same function as a receive block that includes said reference in all clauses). But this implies changing the interface for the messages since you need to pass that reference.

Otherwise looks good, and I don't use that function yet in this project so might be a while before I can confirm it works. On the other hand it looks like we'll be using it so thumbs up. :-)

Now that you have a reference (the monitor) the next logical thing to do would be to enable the selective receive optimization (when a reference is created in the same function as a receive block that includes said reference in all clauses)

Isn't this what I'm doing?

{'DOWN', MonitorRef, _, _, _} ->

On the other hand it looks like we'll be using it so thumbs up. :-)

Good to know :)

You need to do it here also:

{syn_multi_call_reply, Pid, Reply} ->

If the process receiving has a large mailbox, having a reference in that message will avoid looking up the whole mailbox every time a message is received.

Got it, but this is not the case here.

For every recipient, a new process is spawned to send out the call and collect the response. You can see that one process is spawned for every recipient here. These spawned processes are dedicated to receiving the response from only one recipient, so the message box can never grow big in multi_call_and_receive/4: the recipient either responds, crashes, or this process' receive timeouts.

The real collection of the responses, proxied by the spawned processes, happens here, where the selective receive will match either one of two clauses (so no issue there on the inbox).

The reason why I'm spawning processes to send and collect from one recipient is to ensure the timeout is global and does not get refreshed every time a message is received by the collecting process. Alternatively, instead of spawning processes I could handle it by a single process and use timer:send_after/2 to send a message to self when the timeout has passed. However I don't see any particular benefits of one method over the other (possibly the creation of only one timeout instead of many, but I'd have to see if this really has any kind of real performance impact).

If you have a strong opinion let me know.

No it makes sense. Forgot about the spawn. On the other hand the problem exists at

collect_replies(MemberPids, Replies, BadPids) ->

A bit worried that this receive has no timeout/monitor either, it seems there's an assumption that the multi_call_and_receive process will never die and always send a response back, which is perhaps too optimistic. It's unlikely that it dies (all things considered) but the day it happens we got a process stuck on our hands. (It would most likely survive most normal scenarios, but not a chaos monkey.)

This is actually managed. :)

As you can see the multi_call_and_receive processes are spawn_linked to the process that originally requests the multi_call.

So, if one of multi_call_and_receive processes was to die (highly unlikely, but still) then the process that originally requested the multi_call would die too.

Would you like to see a different behaviour?

Sounds good. I think I need to wake up. :-)

:) Ok then, looks like I can close this issue.

Yep, thanks!

This is now part of syn 1.3.1.