Incorrect assumption about PUBLISH errors in autopaho can cause indefinite head-of-queue looping
vishnureddy17 opened this issue · comments
In autopaho
, managePublishQueue
assumes that if paho.PublishWithOptions()
returns an error that is not paho.ErrNetworkErrorAfterStored
, the error is either temporary or the connection will drop:
Lines 531 to 553 in a6def52
However, this is not always the case. All the errors returned in the code snippet below for paho.PublishWithOptions()
are neither temporary nor imply a pending disconnection:
Lines 748 to 762 in a6def52
As a result, managePublishQueue
will loop indefinitely on the first entry in the queue in these cases.
Proposed Solution
To solve this, I propose creating a custom error
type called PahoArgumentError
, which will be returned by paho.PublishWithOptions()
in these cases (this error
type will likely be relevant elsewhere in paho
). Then autopaho
can do a type assertion and quarantine any publish in the queue that returned PahoArgumentError
.
Agreed; I did consider doing something similar when introducing this, but felt that it was OK for an initial PR (hard to know when to stop and submit the PR). My thought was something like paho.FatalError
(indicating that it's pointless to retry) because this could also apply if the broker returns something like "Payload format invalid"). Happy with either option (will try to get a bit more work done on this over the next couple of days).
That sounds good, I also think it's worth returning such an error from Subscribe and Unsubscribe, which would be used to distinguish between errors that are due to a client failure/disconnection and errors that are due to something fatally wrong with the subscribe/unsubscribe
I've had an initial go at this; went with your solution (it's probably useful to be able to differentiate between
a range of "fatal errors"). Ideally this would be extended to look at broker response codes (but I don't think that
is as urgent).
closing due to #226, @MattBrittan feel free to reopen if you still think it's important to keep around
Ideally this would be extended to look at broker response codes (but I don't think that
is as urgent).
This is really only doable if #216 is addressed.
Also, even it was possible, I don't think that autopaho
should retry if the ack came back with an error reason code. What if we're close to the server's receive maximum or we are about to run out of packet IDs? Then retrying could cause issues.