How to restart container when one of the processes fails? How to track service health?

Question

How to restart container when one of the processes fails? How to track service health?

anovv opened this issue 3 years ago · comments

I'm encountering an issue where if one of the cryptostore processes fails (e.g. aggregator or collector) the container still keeps running without being restarted. What is the best practice if I want to restart a container if one of the processes fails (e.g. if we use Kubernetes)?

For example, in Kubernetes there is a tool called liveliness probe, which essentially calls a script/endpoint in a container to check it's health and wether or not it should be restarted. Should there be something similar for cryptostore (i.e. a small server reporting readiness/liveliness of the service).

Another scenario I encountered is collector and other processes running fine, but redis not receiving any messages due to an exception in feed handler. Should we have a check for a message arrival rate for each key in redis?

In general, what should be the guidelines to track the service health?