ostinelli / syn

A scalable global Process Registry and Process Group manager for Erlang and Elixir.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

syn doesn't seem to deregister a process in some cases

bajankristof opened this issue · comments

It looks like in some cases when starting a gen_server process via syn through a supervisor if the process crashes and the supervisor restarts it I get an already_started error which I imagine is because the process isn't unregistered fast enough (?) I am just guessing here.

Anyway if I replace the {via, syn, Name} with {local, Name} it works as expected (but naturally I want to use syn to be able to name my processes with terms rather than atoms).

I attached a zip that contains the stuff that I just started working on. If you fire up the docker containers and then start the app with rebar3 shell, in the shell you can call kyu:kill(test). and that should do the trick (the problem doesn't always occur at first but if you keep trying it will happen).

kyu.zip

I'm running this on macOS with the below versions:
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe] [dtrace]
rebar 3.13.0 on Erlang/OTP 22 Erts 10.6.4

Hello @bajankristof,
I apologize but I cannot dedicate the time to go through your code, generate a docker image, and try repeated times to reproduce the error. Let me see if we can pinpoint your issue though.

  • Are you running it on a single node or in a cluster?
  • What happens if you wait before you restart the process, or if you start it manually afterwards. Does it get registered as expected?
  • Please provide logs.

Thanks for the reply and I totally understand.

  1. I'm running it on a single node
  2. Here's a list of things that I tried:
    1. start the process without supervision -> kill it -> restart it manually (this works as intended)
    2. start the process with supervision and temporary restart strategy -> kill it -> restart it manually (this works as intended as well)
    3. start the process with supervision and permanent restart strategy (intensity: 10, period: 60) -> kill it -> the supervisor tries to restart it, and at this point the crash usually happens (roughly 8 out of 10 times)

Here's an example of the logs that I get (I am using lager):

===> Verifying dependencies...
===> Compiling kyu
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe] [dtrace]

Eshell V10.6.4  (abort with ^G)
1> ===> The rebar3 shell is a development tool; to deploy applications in production, consider using releases (http://www.rebar3.org/docs/releases)
=INFO REPORT==== 14-Feb-2020::12:58:46.501688 ===
Syn(nonode@nohost): Initiating full cluster groups sync for nodes: []

=INFO REPORT==== 14-Feb-2020::12:58:46.520079 ===
Syn(nonode@nohost): Initiating full cluster registry sync for nodes: []

12:58:46.545 [debug] Lager installed handler error_logger_lager_h into error_logger
file="/Users/bajankristof/Projects/Erlang/kyu/_build/default/lib/lager/src/lager_handler_watcher.erl" line=127 module=lager_handler_watcher pid=<0.248.0>
12:58:46.972 [debug] Kyu connection server started
application=kyu connection=test function=init line=36 module=kyu_connection node=nonode@nohost pid="<0.286.0>"
12:58:46.972 [debug] Kyu connection server trying connection
application=kyu connection=test function=handle_info line=55 module=kyu_connection node=nonode@nohost pid="<0.286.0>"
===> Booted syn
===> Booted syntax_tools
===> Booted compiler
===> Booted goldrush
===> Booted lager
===> Booted xmerl
===> Booted sasl
===> Booted tools
===> Booted jsx
===> Booted ranch
===> Booted recon
===> Booted credentials_obfuscation
===> Booted rabbit_common
===> Booted amqp_client
===> Booted kyu
12:58:46.987 [debug] Supervisor {<0.288.0>,amqp_connection_sup} started amqp_connection_type_sup:start_link() at pid <0.289.0>
pid=<0.288.0>
12:58:46.987 [debug] Supervisor {<0.288.0>,amqp_connection_sup} started amqp_gen_connection:start_link(<0.289.0>, {amqp_params_network,<<"guest">>,<<"SpT0j/1SDoxKsfULVIZYR7Bad5QdlfjKDxWpRdRGBtNBfUEfl/I+9oK0Flh+Tu...">>,...}) at pid <0.290.0>
pid=<0.288.0>
12:58:47.019 [debug] Supervisor {<0.289.0>,amqp_connection_type_sup} started amqp_channel_sup_sup:start_link(network, <0.290.0>, <<"client 127.0.0.1:65292 -> 127.0.0.1:5672">>) at pid <0.293.0>
pid=<0.289.0>
12:58:47.021 [debug] Supervisor {<0.289.0>,amqp_connection_type_sup} started amqp_channels_manager:start_link(<0.290.0>, <<"client 127.0.0.1:65292 -> 127.0.0.1:5672">>, <0.293.0>) at pid <0.294.0>
pid=<0.289.0>
12:58:47.026 [debug] Supervisor {<0.289.0>,amqp_connection_type_sup} started rabbit_writer:start_link(#Port<0.14>, 0, 4096, rabbit_framing_amqp_0_9_1, <0.290.0>, <<"client 127.0.0.1:65292 -> 127.0.0.1:5672">>) at pid <0.295.0>
pid=<0.289.0>
12:58:47.034 [debug] Supervisor {<0.289.0>,amqp_connection_type_sup} started amqp_main_reader:start_link(#Port<0.14>, <0.290.0>, <0.294.0>, {method,rabbit_framing_amqp_0_9_1}, <<"client 127.0.0.1:65292 -> 127.0.0.1:5672">>) at pid <0.296.0>
pid=<0.289.0>
12:58:47.042 [debug] Lager installed handler lager_backend_throttle into lager_event
file="/Users/bajankristof/Projects/Erlang/kyu/_build/default/lib/lager/src/lager_handler_watcher.erl" line=127 module=lager_handler_watcher pid=<0.241.0>
12:58:47.056 [debug] Supervisor {<0.289.0>,amqp_connection_type_sup} started rabbit_heartbeat:start_heartbeat_sender(#Port<0.14>, 10, #Fun<amqp_network_connection.2.44686551>, {heartbeat_sender,<<"client 127.0.0.1:65292 -> 127.0.0.1:5672">>}) at pid <0.297.0>
pid=<0.289.0>
12:58:47.056 [debug] Supervisor {<0.289.0>,amqp_connection_type_sup} started rabbit_heartbeat:start_heartbeat_receiver(#Port<0.14>, 10, #Fun<amqp_network_connection.3.44686551>, {heartbeat_receiver,<<"client 127.0.0.1:65292 -> 127.0.0.1:5672">>}) at pid <0.298.0>
pid=<0.289.0>
12:58:47.067 [debug] Kyu connection server connection up
application=kyu connection=test function=handle_info line=58 module=kyu_connection node=nonode@nohost pid="<0.286.0>"


1> erlang:exit(syn:whereis(test), kill).


12:58:56.104 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at <0.286.0> exit with reason killed in context child_terminated
pid=<0.285.0>
12:58:56.104 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
true
12:58:56.105 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.105 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
2> 12:58:56.105 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.105 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.106 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.106 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.106 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.106 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.107 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at undefined exit with reason {already_started,<0.286.0>} in context start_error
pid=<0.285.0>
12:58:56.107 [error] Supervisor kyu_sup had child test started with kyu_connection:start_link(test, #{}) at {restarting,<0.286.0>} exit with reason reached_max_restart_intensity in context shutdown
pid=<0.285.0>
12:58:56.107 [info] Application kyu exited with reason: shutdown
pid=<0.43.0>

(I put some extra whitespace at the point where I kill the process)

Just a quick explanation of what's happening:

  1. The app starts and fires up a supervisor and with that a gen_server process
  2. The gen_server process starts a rabbitmq connection (not linked, without supervision using the official amqp_client library)
  3. I kill the process using erlang:exit(syn:whereis(test), kill)
  4. The error occurs (usually)

The issue is really weird cause it doesn't happen if I do it with a simplified gen_server process (I tried it in a simplified environment with just a dummy gen_server process and couldn't get the error to occur, but I can't find any errors in my current code base which is really small at the moment).

Hello, sorry for the late reply. Unfortunately I do not see enough here in the logs to understand where I could eventually start looking. I do not see any logs from syn in the section after you kill the process...

BTW, it looks like you're not running distributed Erlang (nonode@nohost shows in your logs). Likely unrelated, but you should probably name your nodes.

The missing node name is only because I just started up a blank rebar3 project and didn't bother setting it up much. Anyway, thanks for the reply, unfortunately I can't provide more logs since this was all I got. Should I close this issue since neither of us is able to move forward with it?

Ok. If you happen to get some actionable items or insights do not hesitate to reopen. Thank you.