(Non-)reliable Azure/WSB messaging in V3

Question

(Non-)reliable Azure/WSB messaging in V3

ma499 opened this issue 8 years ago · comments

Previous versions of Nimbus used the default PeekLock receive mode, when reading messages of Azure / Windows Service Bus Queues and Subscriptions. This meant that Service Bus provided an At Least Once message delivery guarantee.

In v3 PeekLock mode is still used for processing dead-letter Queues. But primary Queues and Subscriptions are now read in ReceiveAndDelete mode. This means that Service Bus now only provides at At Most Once delivery guarantee. It is possible for messages to get lost, for example, when a consumer process crashes before the handler has finished processing the message.

I can understand the attraction of using RecieveAndDelete mode to avoid difficult lock renewal algorithms for the old ILongRunningTasks, as well as being consistent with Redis (and other transports that don't provide reliable messaging). However, our reason for using Nimbus was an easy-to-use but lightweight (lighter than Mass Transit) framework for working with Service Bus. And the main reason for using Service Bus (or something similar like Rabbit) is for reliable messaging.

We will not be upgrading to V3 if it can't support reliable messaging. Do you see any prospect of supporting it (perhaps as an option, or second transport implementation) in V3?

Andrew Harcourt · Answer 1 · Sat May 21 2016 06:45:26 GMT+0800 (China Standard Time)

Hi @ma499 :) Yes, it's entirely possible that messages will get lost with the approach we've taken for v3. They would have been able to get lost for different reasons using the "durable" approach provided by ASB/WSB that we used in v2; the key one being the immediate retries with no back-off.

With the pluggable transport provider model in v3, it's entirely possible (and probably relatively straight-forward) to offer a second transport that relies on the ASB message locking approach for 1+ delivery. There's some work to be done first with adding an ASB-specific transport (i.e. one that's not backwards-compatible with WSB and that can take advantage of the newer ASB features) but something that uses Receive+Complete could quite potentially fall out of that.

Mustafa Arif · Answer 2 · Thu May 26 2016 18:42:39 GMT+0800 (China Standard Time)

I guess if it doesn't bother anyone else I suspect we'll look at a pluggable transport implementation for PeekLock in due course (in our case it will need to be backwards-compatible with WSB).

I don't agree that immediate retry with no back-off results in messages getting lost. They get dead-lettered (perhaps unreasonably) which means they can at least be handled and replayed. Deadletters I'm fine with, black holes I'm not.

John Knoop · Answer 3 · Wed Aug 10 2016 04:42:47 GMT+0800 (China Standard Time)

@uglybugger @ma499 I'm also a bit concerned with the at-most-once delivery in the new version. Albeit I'm not familiar with this approach. Is there a different design pattern to use to bridge the possible gaps, or should one just not send critical messages over the bus?

An alternative would be to look at this simple wrapper library for ASB, which perhaps is a better choice for those who just want a simple to use, reliable messaging system?

Andrew Harcourt · Answer 4 · Wed Aug 10 2016 20:05:57 GMT+0800 (China Standard Time)

Hey all.. Just to clarify: the redelivery strategy is still "at least once". Failed messages will still be redelivered to the appropriate recipients; just on a slightly more sensible basis. The idea was to move away from ASB's mindless "hammer the message repeatedly at someone until it either succeeds or fails immediately" approach.

There's certainly scope to add the PeekLock/Complete behaviour back in as an option for the WSB and ASB transports. I don't think it's going to be the default behaviour given the amount of nuisance it creates versus the value it provides. One of the key drawbacks is the utter pain of MessageLostExceptions whenever a handler exceeded the arbitrary maximum lock duration as well as the difficulty of completing messages when the system is under high load.

I'll have a chat with @DamianMac over the next couple of days and we'll see what we can come up with in terms of making it configurable and opt-in rather than painful by default.

Mustafa Arif · Answer 5 · Wed Aug 10 2016 20:46:35 GMT+0800 (China Standard Time)

Thanks Andrew. Look forward to hearing what you come up with.

On Wed, Aug 10, 2016 at 1:06 PM +0100, "Andrew Harcourt" notifications@github.com wrote:

Hey all.. Just to clarify: the redelivery strategy is still "at least once". Failed messages will still be redelivered to the appropriate recipients; just on a slightly more sensible basis. The idea was to move away from ASB's mindless "hammer the message repeatedly at someone until it either succeeds or fails immediately" approach.

There's certainly scope to add the PeekLock/Complete behaviour back in as an option for the WSB and ASB transports. I don't think it's going to be the default behaviour given the amount of nuisance it creates versus the value it provides. One of the key drawbacks is the utter pain of MessageLostExceptions whenever a handler exceeded the arbitrary maximum lock duration as well as the difficulty of completing messages when the system is under high load.

I'll have a chat with @DamianMac over the next couple of days and we'll see what we can come up with in terms of making it configurable and opt-in rather than painful by default.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.