ThreeDotsLabs / watermill-redisstream

Redis Pub/Sub for the Watermill project, leveraging Redis Stream.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Claim fails for consumers with multiple pending messages

hlubek opened this issue · comments

Scenario:

  • Consuming messages in a consumer group -> consumer (i.e. subscriber) shuts down or breaks without ACK
  • Multiple messages are pending for the consumer (especially if handling a message takes longer)
  • Subscriber is restarted (gets a new consumer id) -> first message is claimed after DefaultMaxIdleTime
  • There are lost messages, since the consumer is deleted via XGroupDelConsumer

This is trivial to reproduce by e.g. producing numbers from 0-9, handling them in a subscriber and shutting it down while handling a message. After restarting the subscriber (with the same consumer group), one pending message is claimed and processed (after some time), but in most runs there will be at least one lost message due to the consumer being deleted before all pending messages were claimed.

See https://gist.github.com/hlubek/1a667ec6050bea703b58ba0036d26cc9 for an example program.

Solution:

There needs to be a check if the consumer has any pending message before deleting it. But this is not free from race conditions, since the consumer could come back (if an explicit consumer id is used) and get a message before being deleted.

It would be best to not delete consumers while claiming messages, but to have a max idle duration for consumers and check regularly via XInfoConsumers if a consumer has an idle duration longer than that threshold and doesn't have pending messages (I'd say that should usually be around multiple hours). There should be a way to opt out of deleting consumers automatically for special use-cases.

Thank you @hlubek, it is indeed a problem to delete a consumer right after claiming its message.

Utilizing XInfoConsumers to handle consumer removals sounds like an excellent idea. It would be fantastic if you could create a PR for it!