ably / ably-asset-tracking-android

Android client SDKs for the Ably Asset Tracking service.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wait for Ably connection to leave the "suspended" state before performing operations on Ably

KacperKluka opened this issue · comments

When the Ably connection (or a channel) state is suspended then no Ably operations will succeed. Therefore, there is no point in trying to perform any operation on Ably until the state changes to something else (like connected). We should use the state to prevent the SDK from performing operations on the Ably wrapper if the state is suspended.

I thought that we already did this? There's definitely at least some places in DefaultAbly where if the channel is suspended then it'll wait for it to attach before proceeding with the operation (I tangentially mentioned it in #912).

We had the retryChannelOperationIfConnectionResumeFails() but we decided that the retries should happen in the WorkerQueue and that we shouldn't have arbitrary timeouts on Ably operations that will be repeated automatically by the ably-java SDK. Here it's more about monitoring the overall Ably connection state and deciding whether to perform operations on the channels or not 🤔

As it stands, it seems like the following behaviours exist for a channel in the SUSPENDED state (links to tests):

Does that mean that for this issue we only need to update the behaviour of connect(), or have I misunderstood?

The retryChannelOperationIfConnectionResumeFails function used by sendEnhancedLocation(), sendRawLocation(), and updatePresenceData() is getting removed in #965

I think the idea behind this ticket is to check the channel state and postpone those actions at the workerQueue level

AAT is doing the following actions affected by this issue:

  1. send location - currently, if this operation fails, a retry is attempted without specifying any delay, which can be immediate. If we wait for a channel to get out of the suspended state, we can get a long locations queue to send (which can require additional work to ensure order). I think the best approach here would be to postpone the retry attempt with an additional delay, i.e., 200 ms, using already implemented location retry logic MAX_RETRY_COUNT = 1.
  2. update presence data - used by both publisher and subscriber to exchange resolution-related data. Currently, any exception that occurs on the subscribe side is logged and thrown to the caller. On the publisher side, all exceptions are ignored. In the previous implementation, updatePresenceData was waiting for the channel to attach with an arbitrary timeout. In this case, we could wait indefinitely until the channel enters the connected state.
  3. connect - this method wraps two operations: creating a channel and entering its presence. I would suggest separating the two, not attempting to enter presence if it cannot succeed, and using workers we already have in place to retry the enter operation.

@paddybyers @jaley, what do you think about the suggested approach to this issue?

On 3 - this works well with the refactoring I'm doing as part of #966 . I can just tweak the worker slightly to always do a presence enter (once channel is attached) and not rely on ably.connect returning true to determine if presence is entered :)

It would be good to avoid the problems of #912 (and hence close that issue) when we implement this too (that is, we should make sure that as soon as we leave the SUSPENDED state, we retry the operation, instead of the current behaviour of waiting to enter the ATTACHED state).

We agreed on the suggested solutions on a call, with an increased retry timeout to 1 second for sending location.