k8s: Prometheus missing after restart of pod
0xErnie opened this issue · comments
Alexander Kauerz commented
This is a followup to #32
We are running clustercontrol on kubernetes.
Our manifests can be found in this gist.
After a deletion of the pod prometheus fails to start.
Steps to reproduce:
- Run ClusterControl on kubernetes
- Enable agent based monitoring on one cluster
- Wait until monitoring is ready
- Delete the pod
- Wait until a new pod is running
- Check the monitoring for one cluster
What I expect to happen:
Monitoring should stay up and running.
What happens:
Monitoring is not working anymore.
Jobs named "recovering monitoring system" are flooding the job log.
cc-failed-recovery.txt
How one can resolve the situation, temporarily:
- Open the monitoring view of one Cluster
- Trigger "Re-Enable Agent Based Monitoring"
- Wait until the job is finished
cc-redeployment.txt
Ashraf Sharif commented
This request requires Prometheus to be preloaded into the image (otherwise, you would have to re-enable the monitoring every time a pod is rescheduled). I will need to spend some time on testing. Working on it now.
Ashraf Sharif commented