Nerve fails to restart on watcher failure
Jaykah opened this issue · comments
Saw a similar topic somewhere, but since the fix has been apparently merged, decided to open a new issue.
I am using a simple mysql check to register the members of a Galera Cluster.
I, [2014-08-30T09:47:26.757807 #40790] INFO -- Nerve::Reporter::Zookeeper: nerve: successfully created zk connection to x.example.com:2181,x2.example.com:2181,x3.example.com:2181/services/database
I, [2014-08-30T09:47:26.776437 #40790] INFO -- Nerve::ServiceCheck::MySQLServiceCheck: nerve: service check user@10.1.1.1 initial check returned true
I, [2014-08-30T09:47:26.803240 #40790] INFO -- Nerve::ServiceWatcher: nerve: service db is now up
I, [2014-08-30T13:58:51.491719 #40790] INFO -- Nerve::ServiceCheck::MySQLServiceCheck: nerve: service check user@10.1.1.1 got error #<RuntimeError: failed to connect with mysql: ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
>
I, [2014-08-30T14:00:08.381207 #40790] INFO -- Nerve::ServiceCheck::MySQLServiceCheck: nerve: service check user@10.1.1.1 got error #<RuntimeError: failed to connect with mysql: ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
>
I, [2014-08-30T17:22:00.684380 #40790] INFO -- Nerve::ServiceCheck::MySQLServiceCheck: nerve: service checkuser@10.1.1.1 got error #<RuntimeError: failed to connect with mysql: ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
>
After which the checks stop, and although the node has already been restored, it fails to register in Zookeeper.
Hi Guys,
I have a similar a problem.
Nerve works very well to unregister an instance with problems (based on health/ping checks), but when this same instance back to work nerve doesn't register this instance in ZK.
If I force a restart in nerve everything works perfectly, but this is not a elegant way to fix the problem.
Please let me know if you are still seeing this issue, and we can re-open and dive into it more.