ShoreTel-Inc / erld

Erlang UNIX daemon wrapper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error starting node using erld

eproxus opened this issue · comments

I have added the heartbeat process to one of my supervisors. I'm also calling erld:detach() in my application start/2 callback (before it returns).

The following error is displayed when starting the service using the example init script:

$ service mynode start

Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
Starting the My Node server... Erlang exited before detaching, not trying to restart it.
FAILED (check erld.log for errors)

show output:

$ service mynode show
The server is started in the following environmnent:
(HOME is already set when running interactively.)
export HOME='/home/deploy'
The command used to start the server is:
/usr/bin/erld  -c /etc/mynode/cookie -l /var/log/mynode/erld.log -p /var/run/mynode.pid -t 22 -T 8 -g 10 -r 3 -i 1200 -M erld_logger -F rotate_logs -- /usr/bin/erl +Ktrue +B -noinput -config /etc/mynode/sys -name divinity-backend -boot /etc/mynode/mynode -shutdown_time 5000 +W w
The command run by erld is:
/usr/bin/erl +Ktrue +B -noinput -config /etc/mynode/sys -name mynode -boot /etc/mynode/mynode -shutdown_time 5000 +W w
To run without overriding the error logger remove the "-config" option.

erld.log:

2014-02-11 09:44:56.652633 19724 erl: 09:44:56.646 [info] Application lager started on node 'mynode@app01.domain.com'
2014-02-11 09:44:56.654388 19724 erl: 09:44:56.654 [info] Application sasl started on node 'mynode@app01.domain.com'
2014-02-11 09:44:56.662688 19724 accept_erlang_connection: ei_accept failed (5): Input/output error
2014-02-11 09:44:56.662719 19724 erld_main_loop: Failed to accept first connection from erl node: exit.
2014-02-11 09:44:56.663661 19724 cleanup_child: erl exited with status 0: terminated by signal 1 (no core dump).
2014-02-11 09:44:56.663698 19724 erld_main: Erlang exited before detaching, not trying to restart it.

I'm having a hard time interpreting the error messages in the erld.log.

Ok, implemented the bake_cookie callback, and now the crash dump line is gone. New output of service mynode start:

Starting the Divinity Backend server... Erlang exited before detaching, not trying to restart it.
FAILED (check erld.log for errors)

The rest of the logs and output is the same.

commented

Hi Adam,

The important error here is the one about ei_accept(): that’s the function (provided by an official Erlang library) that accepts a connection from another Erlang node, it’s the way your Erlang node communicates with erld (which appears to it as a “hidden C node”) (sorry if you already knew this).

So what appears to be happening is that erld is starting, it runs erl and then something is connecting to erld’s network socket but when it tries to accept the connection something is going wrong and producing an IO error.

I’ve seen this happen when there’s something wrong with the networking, either because the Erlang node-to-node communication isn’t working (e.g. normal Erlang nodes aren’t able to communicate properly) or if something else is sending packets to it’s port, or if the Erlang VM is crashing very quickly as it starts up.

(I haven’t had time to specifically test these again to see if they produce the same io error tho.)

It might be useful to rule out the Erlang VM crashing or exiting quickly; does your node start up OK without erld, but otherwise using exactly the same options to “erl”?

Can you write some debug from early on in your application, to standard out and see if it appears (both when running outside erld, and when running inside it)?

Another thing to check is to make absolutely sure that the same cookie is being used by both erld and your Erlang node, again I haven’t tested that specifically to see if it produces exactly this error but it does cause the node connection to fail if they don’t match (I’d love to provide better output from erld about this but ei_accept() doesn’t return anything useful). I’ll check this if I have time.

Sam.

template<class T,class...>class C{C<T*const,T,C>a;C<T,C>b;};Cc;

On 11 Feb 2014, at 11:50 pm, Adam Lindberg <notifications@github.commailto:notifications@github.com> wrote:

Ok, implemented the bake_cookie callback, and now the crash dump line is gone. New output of service mynode start:

Starting the Divinity Backend server... Erlang exited before detaching, not trying to restart it.
FAILED (check erld.log for errors)

The rest of the logs and output is the same.


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-34751004.


This e-mail and any attachments are confidential. If it is not intended for you, please notify the sender, and please erase and ignore the contents.

Managed to get it to work. The only way I can use the supplied init script is by adding -setcookie my_cookie to the end of ERL_COMMAND. What was the idea behind the cookie file and how did you imagine it would be used by the actual Erlang node?

Now service mynode stop does not work. It times out and hard kills the process. Nothing shows up in erld.log or my application log as it does this. I tried adding -setcookie my_cookie to the eval in stop() in the init script, but this did nothing.

Hi eproxus,
How are you getting on with erld now that you've made those tweaks you kindly sent us? If you're still having issues I'm happy to try to help.

Cheers,

Bernard

I also managed to get it to stop properly, although I'm not exactly sure how. 😄 Thanks for all the help!

Bernard,

The application works flawless if run directly using the same erl command as in init script. However If I run it with init script, then I get the below errors. Node name and cookie issues are fixed and even the erlang node to node communication is happening between guest and host in Mac using virtual box.

accept_erlang_connection: ei_accept failed (5): Input/output error
2014-10-17 18:17:58.155103 1708 erld_main_loop: Failed to accept first connection from erl node: exit.
2014-10-17 18:17:58.156029 1708 cleanup_child: erl exited with status 0: terminated by signal 1 (no core dump).
2014-10-17 18:17:58.156071 1708 erld_main: Erlang exited before detaching, not trying to restart it.

Any input is appreciated, thanks.

Suresh