ostinelli / syn

A scalable global Process Registry and Process Group manager for Erlang and Elixir.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Syn never reconciles 2 processes with the same name

DmitryKakurin opened this issue · comments

Hello Roberto,

I'm facing an issue with syn where duplicate processes with the same name can be registered (on different Erlang nodes) and syn never kills the duplicates.

I've tried with both :syn v1.6.3 and with v2.0-rc.1 and it reproes with both.

My repro steps are:

  1. Start 3 Erlang nodes on the same machine
  2. Interconnect them into Erlang cluster
  3. Call :syn.init on each one (if using v1.6.3)
  4. Simultaneously start a process on all 3 nodes that, in a loop, sends :ping to {:via, :syn, "123"} using GenServer.call and creates it locally (and registers with syn) if it's not found, sleeps for 1sec and repeats. The "123" process simply replies :pong to :ping calls.
  5. Often, at this point, I would have more than one process named "123" running on different nodes.
  6. If the issue didn't repro in step 5, kill the OS process hosting the Erlang node where "123" is running. Both remaining nodes will create and register their own instance of "123" process.

And while it's unfortunate that multiple processes with the same name "123" can be created (it would be more desirable for us if registration for duplicates were to fail), the worst issue is that this split-brain situation never gets resolved. According to documentation, we expect one of the duplicate "123" processes to be killed by syn, but it never happens.

I would appreciate any ideas/workarounds/fixes.
Please let me know if you need more information or have any questions.

Thank you, Dmitry.

Hello Dmitry,
Not sure about your process but what I seem to understand is that you are having 1 process per node registered with "123", but the registered process is different per node. Please confirm that this is correct.

If so, what you are experiencing is a registration race condition. I'm currently working on those, I'll get back when I've got something for you to try. Meanwhile, if you have runnable code that allows to reproduce the issue, it would help.

2.0.1 is out, please check and let me know. If it doesn't solve what you are experiencing, let me have runnable code to reproduce. Thanks,
r.

Thank you for such a quick turnaround, really appreciate your help! I’ll try it today and will extract a repro if it’s not fixed.

So far I was unable to repro with 2.0.1, will run more tests and will let you know.

Closing. Feel free to re-open if you are able to reproduce, with code that would allow me to proceed. Thank you.

Thanks a lot Roberto!
I don't often receive this level of support for commercial products :-).

I did a lot of testing with v2.0.1, and even though sometimes more than one process "123" gets created, the conflict is very quickly resolved and one of the processes gets killed (big improvement from v2.0.0). That's what I expect, especially given my artificially precise synchronization of creation.

I'm attaching my repro code here for safekeeping in case we need to test for regressions in the future.
SynDup.zip

Thank you, Dmitry.