eclipse / paho.mqtt.python

paho.mqtt.python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AWS Quota Exceeded Infinite Retry

jamwest opened this issue · comments

We run a number of IOT devices that use AWS IOT Core for the MQTT Message Broker.

An issue came up where we were using many GBs of cellular upload data per day!! Luckily this only occurred in specific cases but this could have been catastrophic for us.

The cause of this was AWS's quota limit of 128kb per message. AWS returns a DISCONNECT message when it receives a message exceeding the quota. The problem with this is that when you set the QOS to 1 and don't clean the session, paho.mqtt retries any unacknowledged messages on disconnect... So here we were sending messages that were larger than the limit every ~3 seconds infinitely (or until the devices were restarted).

We have found a quick solution which is to always check the size of the message before trying to send the message, and to add this to the on_disconnect function:

def on_disconnect(client, userdata, rc):
    if rc != 0:
        client._out_message_mutex.acquire()
        queue = client._out_messages
        queue.popitem(last=False)
        client._out_messages = queue
        client._out_message_mutex.release()
        print("Removed failing message from queue")

client = paho.Client(...)
client.on_disconnect = on_disconnect

...

Maybe there is a better way to handle this in the package? It is a gotcha with some potentially dire consequences.

Hopefully this helps someone in the future.

The only real mechanism that MQTT v3 provides to deal with invalid messages is to drop the connection. This means there is no real way for the client to tell what the issue is.

If a Server implementation does not authorize a PUBLISH to be performed by a Client; it has no way of informing that Client. It MUST either make a positive acknowledgement, according to the normal QoS rules, or close the Network Connection [MQTT-3.3.5-2].

MQTT V5 provides the ability to return acknowledgments with Reason Codes indicating that the message has not been accepted. Perhaps consider moving to V5 (I believe IoT Core supports this and would guess it will return an acknowledgment indicating "Quota exceeded").

Closing this as I believe I have answered the question.