dashbitco / broadway_cloud_pub_sub

A Broadway producer for Google Cloud Pub/Sub

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Discussion: Client-agnostic, automatic lease management

mcrumm opened this issue · comments

Opening this issue to get more input on automatic lease management.

Google Cloud Pub/Sub clients that support Asynchronous Pull (most notably, Python) provide a lease management mechanism behind-the-scenes to keep messages alive while they are being processed.

In python, the Leaser class provides the logic to extend the message ack deadline if it has not yet been acknowledged:
https://github.com/googleapis/google-cloud-python/blob/7e16c08581a059a6c8e40b0403f4af1ebfe0fea6/pubsub/google/cloud/pubsub_v1/subscriber/_protocol/leaser.py#L100

The Leaser is aided by a Histogram that, by default, tracks the 99th percentile of ack requests to determine how long to extend the lease for a given message:
https://github.com/googleapis/google-cloud-python/blob/7e16c08581a059a6c8e40b0403f4af1ebfe0fea6/pubsub/google/cloud/pubsub_v1/subscriber/_protocol/histogram.py

Ideally BroadwayCloudPubSub would provide the same functionality, but we can do so in a client-agnostic way, meaning our implementation would work for Pull requests over both HTTP and gRPC, as well as for Streaming Pull requests (gRPC only), we were ever to decide to implement those.

To keep this client-agnostic, BroadwayCloudPubSub.Producer probably needs to become the Acknowledger so that the Client implementation doesn't need to know about the lease management functionality.

Discussion originally started by @mcrumm in #17 (comment)

This sounds very neat but I also assume it is mostly a corner case, so I am glad to sit and wait until someone has a use case for it. I assume the initial value for ack is configurable, right? If so, that should be good enough for the huge majority of cases.

@josevalim The only potential issue is that the maximum ack deadline you can set on a message is 10 minutes. You can continue to modify the deadline and effectively keep the message checked out forever, but only in 10min increments.

I vaguely recall you posting something re: GenStage consumers, saying that long running processes are perfectly acceptable there, but I suppose that "long" could be subjective in this instance.

Are any of the other connectors subject to similar restrictions on message processing time?

Either way, I thought I might take a pass at implementing something like this from outside the connector first, and see where it gets me. Ultimately that may lead to more insights into what customizing acknowledgements might look like.

Closing this as the need has never arisen, but may revisit in the future :)