dashbitco / broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Home Page:https://elixir-broadway.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`Message.ack_immediately/1` should return a list of messages which could not be acknowledged successfully

joeLepper opened this issue · comments

Background

We have a use-case where a consumer calls Message.ack_immediately/1 as soon as it pulls a message from a queue to ensure that no other consumers handle it – we would rather not handle a message than double-handle a message and therefore take pains to ensure that we ack as soon as possible.

Recently we have noticed that some messages do get double-handled. Digging into this we realized that when Message.ack_immediately/1 gets called it does not ensure that the ack was successful, nor does it alert our consumer that the ack was unsuccessful. Rather, it ignores any failures or errors in acknowledging receipt of a message.

The cases that we have observed in our logging indicate that the process that broadway_rabbit is calling to ack the message is no longer alive.

"Could not ack/reject message: ** (exit) exited in: :gen_server.call(#PID<0.14927.0>, {:call, {:\"basic.ack\", 3, false}, :none, #PID<0.3840.0>}, 60000) --     ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started --     (stdlib 3.11.1) gen_server.erl:223: :gen_server.call/3 --     (amqp 1.6.0) lib/amqp/basic.ex:135: AMQP.Basic.ack/3 --     (broadway_rabbitmq 0.6.2) lib/broadway_rabbitmq/producer.ex:440: anonymous fn/3 in BroadwayRabbitMQ.Producer.ack_messages/3 --     (elixir 1.10.2) lib/enum.ex:783: Enum.\"-each/2-lists^foreach/1-0-\"/2 --     (elixir 1.10.2) lib/enum.ex:783: En"

This log entry originates here

Details

Proposal

Consumers which are calling Message.ack_immediately/1 are doing so because they need their message to definitely be acknowledged before processing it. In our case, if the message cannot be successfully acknowledged, we would rather drop it than process it. Therefore, Broadway should return a list of messages which could not be successfully acknowledged.

Sort messages handled by BroadwayRabbitMQ.Producer. ack_messages /3 into successful and unsuccessful groups

Replace Enum.each in BroadwayRabbitMQ.Producer. ack_messages /3, with Enum.reduce sorting the messages which have been successfully acknowledged from those which were not into a map sort of like the following.

%{
  successes: [...messages...],
  failures: [...messages...]
}

This will involve both checking the return value from apply_ack_func to see if it is :ok or {:error, error} and passing any messages which end up in the catch block.

Return this map from BroadwayRabbitMQ.Producer. ack_messages /3

Return an acknowledgement status for each message passed toBroadway.Acknowledger.ack_messages/2

Replace the Enum.each in BroadwayRabbitMQ.Producer.ack_messages /3 with an Enum.reduce which merges the maps returned from Broadway.Acknowledger.ack_messages/2 together, and return it.

Return the acknowledgement status for each message passed to Broadway.Message.ack_immediately/1

Stop ignoring the return value of Broadway.Acknowledger.ack_messages/2 (because it is not longer always nil) and pass that to the caller.

Repeat this process for Broadway's other connectors

The other Broadway connectors will need to have their Producer.ack_messages/3 function updated to return this acknowledgement status map.

Conclusion

We are happy to submit fixes as outlined in this issue (or a different approach which might come out of conversation here). I'm going to craft a draft so that there is something a bit more concrete to pick at as a straw man.

Hi @joeLepper, thanks for the detailed wrap-up.

Digging into this we realized that when Message.ack_immediately/1 gets called it does not ensure that the ack was successful, nor does it alert our consumer that the ack was unsuccessful.

This is not supposed to happen. The ack callback should fail if it cannot acknowledge a message. My suggestion is to make sure the RabbitMQ driver is raising in these scenarios, which will surface it enough for you to pick it up.

@josevalim so you advise going with an exception if broadway_rabbitmq can't ack the message? I think that's the quickest path to success here, but maybe by returning whether the ack callback succeeded or not we're not gonna paint ourselves in a corner for the future if we want to do it at some point, since the acker is a behavior IIRC.

we already try/catch in the other places, so broadway_rabbitmq should definitely raise if it failed. I agree this is not ideal but this is a non-breaking change we can do right now.

Alright, that makes sense, we'll get on it :)

@josevalim we fixed broadway_rabbitmq and released v0.6.3. Do we need to do anything here in Broadway or in other drivers, or can we close this?