k8s: Prometheus missing after restart of pod

Question

k8s: Prometheus missing after restart of pod

0xErnie opened this issue 3 years ago · comments

This is a followup to #32

We are running clustercontrol on kubernetes.
Our manifests can be found in this gist.
After a deletion of the pod prometheus fails to start.

Steps to reproduce:

Run ClusterControl on kubernetes
Enable agent based monitoring on one cluster
Wait until monitoring is ready
Delete the pod
Wait until a new pod is running
Check the monitoring for one cluster

What I expect to happen:
Monitoring should stay up and running.

What happens:
Monitoring is not working anymore.

Jobs named "recovering monitoring system" are flooding the job log.
cc-failed-recovery.txt

How one can resolve the situation, temporarily:

Open the monitoring view of one Cluster
Trigger "Re-Enable Agent Based Monitoring"
Wait until the job is finished
cc-redeployment.txt

Ashraf Sharif · Answer 1 · Thu Oct 14 2021 19:21:35 GMT+0800 (China Standard Time)

This request requires Prometheus to be preloaded into the image (otherwise, you would have to re-enable the monitoring every time a pod is rescheduled). I will need to spend some time on testing. Working on it now.

Ashraf Sharif · Answer 2 · Fri Oct 22 2021 18:24:38 GMT+0800 (China Standard Time)

Hi @0xErnie ,

I have pushed this commit: c2e0e13 so the image is pre-built with Prometheus. Available in the following tags on Docker Hub: latest, 1.9.0-3 or 1.9.0.

I am closing this issue now. Thank you.