tendermint / tmkms

Key Management service for Tendermint Validator nodes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KMS connection failure: "session limit of 1048576 messages exceeded"

mdyring opened this issue · comments

We've just experience the below error ("signing operation failed") on the tmkms side, while connecting to irisd instance for irishub.

As the current iris release does not support automatic KMS connection recovery, manual intervention was required, leading to some missed blocks in the interim:

image

Quick search of the GitHub repo doesn't reveal anything related to this, so I am stuck trying to determine what would cause this error. Any help appreciated.

18:31:35 [INFO] [irishub@tcp://xxxx:27659] connected to validator successfully
06:15:14 [ERROR] [irishub@tcp://xxxx:27659] signing operation failed: protocol error: session limit of 1048576 messages exceeded: protocol error: session limit of 1048576 messages exceeded
06:15:15 [INFO] KMS node ID: DD7036834704E2CFF8C7B35C68F8933D18ECA2E8
06:32:23 [ERROR] [irishub@tcp://xxxx:27659] I/O error

These errors are supposed to be handled internally within yubihsm-rs, however it doesn't look like that happened correctly.

I'll see if I can write some tests to reproduce and handle this better.

@zmanian also commented on Twitter: "This sounds like a secret connection needing to re-key and the lack of auto restart on the Tendermint side causing a failure"

Yep, that's correct

I opened up a PR to automatically initiate a new session and retry sending the command in the event this occurs:

tendermint/yubihsm-rs#203

The above PR was included in yubihsm-rs v0.25.0, which is included in #259 (which I will hopefully land today)

#259 is landed, so I'll close this out.