Handle duplicated messages after connection lost

Question

Handle duplicated messages after connection lost

msaraiva opened this issue 5 years ago · comments

If the connection is lost after the message has been successfully processed but before been acknowledged, there's no way to acknowledge the message anymore since the acknowledgement is bound to the channel that has delivered the message. The documentation states that:

A channel only exists in the context of a connection and never on its own. When a connection is closed, so are all channels on it.

That means messages that were processed but not acknowledged will be requeued and processed more than once. However, the documentation also explains that:

If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped)

That raises a couple of questions:

Should broadway_rabbitmq have a builtin way to handle duplicated message due to a connection lost?
Can we use the redelivered flag to avoid processing the message again? If so, how can we check if the redelivered message was previously successfully processed or not? The new message will have a different delivery_tag on a new channel which removes the possibility of comparing the messages. Is there another way?

Lajos Gerecs · Answer 1 · Tue Mar 19 2019 19:17:35 GMT+0800 (China Standard Time)

Hi,

The redelivered flag provides you the information that the message was sent out at least once, but it doesn't provide you if it was actually processed. For example when you view a message on the management interface it is marked as redelivered, so you can't just throw away every redelivered message. If you are having distributed consumers it may be harder to agree on what was actually processed than processing the message again, if that makes sense. You can put a unique id into each message to keep track of the message and record if the processing finished or not.

Marlus Saraiva · Answer 2 · Tue Mar 19 2019 22:19:01 GMT+0800 (China Standard Time)

Hi @luos!

Thanks! That was very helpful. It looks like handling duplicated messages will add a lot of complexity to Broadway, especially due to the possibility that the message can be redelivered to another distributed consumer. So I believe we should keep this out of Broadway's scope. However, I think we should provide a way to allow the user to implement his own solution when necessary. One idea would be to define a handle_unacknowledged_message callback. @josevalim WDYT?

José Valim · Answer 3 · Wed Mar 20 2019 22:41:39 GMT+0800 (China Standard Time)

Thanks @luos!

@msaraiva I think for now we can leave this issue open and we will work on it once we have more use cases + feedback.

José Valim · Answer 4 · Mon Jul 15 2019 17:01:43 GMT+0800 (China Standard Time)

Btw, I think this is no longer relevant now that we metadata. With metadata, we can expose the redelivered flag to users and then they can act accordingly, isn't this correct?

László Hegedüs · Answer 5 · Fri Nov 22 2019 16:14:03 GMT+0800 (China Standard Time)

~~What about adding some option called delivery_method? With values :at_most_once, :at_least_once. In case someone doesn't want to or doesn't know how to use metadata.~~

edit: Sorry, just noticed this has already been considered. #37