tendermint / tmkms

Key Management service for Tendermint Validator nodes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remove signal handling?

tarcieri opened this issue · comments

I've had several anecdotal reports of the KMS hanging in the SIGINT/SIGTERM handler. Here's one:

https://twitter.com/validator_net/status/1156387763320696834

The handlers were added in #161 (cc @thanethomson) but to me feel like they're signal handlers for the sake of having signal handlers. The KMS is otherwise stateless except for the double signing state files which are persisted using the atomicwrites crate, which makes them kill -9 safe-ish.

This is exacerbated by the lack of network timeouts (#310), so the signal handler ends up blocking indefinitely while joining threads that are stuck in an indefinite blocking I/O operation.

#310 spells out several ways to solve the network timeout issue, but in the short term I think it's worth ripping out the signal handler as I don't think it buys us anything and only gets in the way of SIGINT/SIGTERM otherwise clobbering the process, and is causing validator outages in practice.