DLTcollab / dcurl

Hardware-accelerated Multi-threaded IOTA PoW, drop-in replacement for ccurl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make remote worker connect to the RabbitMQ after restarting

marktwtn opened this issue · comments

If the RabbitMQ is restarted, the remote worker will be closed.
We should modify the remote worker to make it keep trying to connect to the RabbitMQ when it is unavailable.

I have known the way to fix the problem. Just initialize again.

However, there are two points need to be concerned:

  • The program execution flow should be taken care with
  • The condition of reinitialization
    The current implementation would only return true or false when consuming message.
    The wrapped function hides the detail of the error type, which is hard to known the exact problem.

The wrapped function should be rewrite to reveal more error detail instead of only true or false.

The error type is recorded in the structure amqp_rpc_reply_t and the possible error is listed in the amqp_status_enum.
Once we retrieve the amqp_rpc_reply_t structure, we can check the error type and recovery it.

Currently we only focus on the error type of restarted RabbitMQ broker and recover remote worker with reinitialization.

The reinitialization will be wrapped as an infinite loop which reinitializes until it is success.

In remote worker, three RabbitMQ APIs should be handled well if error occurs.

However, based on the document,

amqp_basic_ack says

this will not indicate failure if something goes wrong on the broker

and amqp_basic_publish says

error conditions that occur on the broker (such as publishing to a non-existent exchange)
will not be reflected in the return value of this function

I will test the behaviour of the last two APIs when the RabbitMQ broker is stopped or restarted.

Even if I close the RabbitMQ broker, the APIs amqp_basic_ack() and amqp_basic_publish() can return without error if the network sockets are not closed yet.

When the network sockets are not closed yet, the first called API would return successfully.
After it returns, the sockets would become closed.
Then the API called later would encounter socket error and fail.

I am still trying to figure out the way to solve the problem.

To close a socket, we need a four-way handshake.
If we are going to close the RabbitMQ broker, the socket of it should wait for the response of the socket of the remote worker.
As shown in the picture:
tcp-close-state-flow

However, when I close the RabbitMQ broker, it does not wait and it just closes.
The socket of remote worker is in the status of CLOSE-WAIT.
And the APIs amqp_basic_ack and amqp_basic_publish does not fail with the status of CLOSE-WAIT.

There are other people encounter the same issue like me alanxz/rabbitmq-c#461.
The client thinks it is yet connected to the server.

Another related issue: alanxz/rabbitmq-c#391.