eclipse / paho.golang

Go libraries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

QOS1: semaphore: released more than held

MattBrittan opened this issue · comments

Describe the bug

Encountered this on a node I'm using for testing; no more debugging info available. Currently don't have enough info to reproduce this.

panic: semaphore: released more than held

goroutine 48 [running]:
golang.org/x/sync/semaphore.(*Weighted).Release(0x19d4930, 0x1)
        golang.org/x/sync@v0.4.0/semaphore/semaphore.go:103 +0x174
github.com/eclipse/paho.golang/paho/session/state.(*State).PacketReceived(0x1848e00, 0x19a3e20, 0x1965140)
        github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/session/state/state.go:493 +0x374
github.com/eclipse/paho.golang/paho.(*Client).incoming(0x18ae480)
        github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/client.go:449 +0x444
github.com/eclipse/paho.golang/paho.(*Client).Connect.func4()
        github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/client.go:313 +0xe0
created by github.com/eclipse/paho.golang/paho.(*Client)xc.Connect in goroutine 36
        github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/client.go:310 +0xe88

It appears that the client received an unexpected Puback. As the semaphore code needs some work regardless I'm not going to dig deeper right now (will make some changes to how this works and add extra logging at the same time).

relevant ClientConfig settings:

CleanStartOnInitialConnection: false,            // the default
SessionExpiryInterval:         1296000,          // 15 days

To reproduce

Started remote note, panic occurred almost immediately. This note publishes QOS1 messages; I'm guessing that the issue is that it received an unexpected ACK (currently the code releases the semaphore without checking that there is an entry in the session state map).

Upon restart the app ran normally (local session state was empty).

Expected behaviour

Does not panic! (initial change would be to log this)

Software used:

  • Server: Mosquitto 2.0.15
  • Client @master (v0.12.1-0.20231019064150-27c89b5450f2)

Looking into this further we need to move away from using golang.org/x/sync/semaphore to control the number of inflight messages. This is because the spec states:

The send quota is not incremented if it is already equal to the initial send quota. The attempt to increment above
the initial send quota might be caused by the re-transmission of a PUBREL packet after a new Network Connection is
established.

If this happens with a semaphore then we get the panic seen in my issue (note: I am not saying this was the cause of that particular panic; just that it's possible). I am am raising a change to implement this (should prevent this crash and hopefully provide insight into the cause).

Closing this because the fix will prevent it recurring and represents a fairly major change (meaning the issue as reported no longer exists).