QOS1: semaphore: released more than held
MattBrittan opened this issue · comments
Describe the bug
Encountered this on a node I'm using for testing; no more debugging info available. Currently don't have enough info to reproduce this.
panic: semaphore: released more than held
goroutine 48 [running]:
golang.org/x/sync/semaphore.(*Weighted).Release(0x19d4930, 0x1)
golang.org/x/sync@v0.4.0/semaphore/semaphore.go:103 +0x174
github.com/eclipse/paho.golang/paho/session/state.(*State).PacketReceived(0x1848e00, 0x19a3e20, 0x1965140)
github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/session/state/state.go:493 +0x374
github.com/eclipse/paho.golang/paho.(*Client).incoming(0x18ae480)
github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/client.go:449 +0x444
github.com/eclipse/paho.golang/paho.(*Client).Connect.func4()
github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/client.go:313 +0xe0
created by github.com/eclipse/paho.golang/paho.(*Client)xc.Connect in goroutine 36
github.com/eclipse/paho.golang@v0.12.1-0.20231019064150-27c89b5450f2/paho/client.go:310 +0xe88
It appears that the client received an unexpected Puback
. As the semaphore code needs some work regardless I'm not going to dig deeper right now (will make some changes to how this works and add extra logging at the same time).
relevant ClientConfig settings:
CleanStartOnInitialConnection: false, // the default
SessionExpiryInterval: 1296000, // 15 days
To reproduce
Started remote note, panic occurred almost immediately. This note publishes QOS1 messages; I'm guessing that the issue is that it received an unexpected ACK (currently the code releases the semaphore without checking that there is an entry in the session state map).
Upon restart the app ran normally (local session state was empty).
Expected behaviour
Does not panic! (initial change would be to log this)
Software used:
- Server: Mosquitto 2.0.15
- Client @master (v0.12.1-0.20231019064150-27c89b5450f2)
Looking into this further we need to move away from using golang.org/x/sync/semaphore
to control the number of inflight messages. This is because the spec states:
The send quota is not incremented if it is already equal to the initial send quota. The attempt to increment above
the initial send quota might be caused by the re-transmission of a PUBREL packet after a new Network Connection is
established.
If this happens with a semaphore
then we get the panic seen in my issue (note: I am not saying this was the cause of that particular panic; just that it's possible). I am am raising a change to implement this (should prevent this crash and hopefully provide insight into the cause).
Closing this because the fix will prevent it recurring and represents a fairly major change (meaning the issue as reported no longer exists).