Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory

Question

Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory

balait4 opened this issue a year ago · comments

I have the same issue as mentioned in the already closed bug #959

Others as well facing the issue and the fix was only working for 4.2.6.

I tried for the version 4.2 and 6.0 which is still having the below issue:

Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory goroutine 1 [running]: main.main() /workspace/cmd/readiness/main.go:217 +0x19a

My operator version is : 0.7.9
Issue with MongoDB version: 4.2 & 6.0 as well.

balait4 · Answer 1 · Mon May 22 2023 21:12:42 GMT+0800 (China Standard Time)

Any update on this please or any workaround already there for this? thanks

Jon Haynes · Answer 2 · Thu Jun 01 2023 00:08:39 GMT+0800 (China Standard Time)

I also had this problem on 6.0.6 in operator version 0.8.0 and started following this issue.

The problem sorted itself out today when I made some fixes to my TLS certificate configuration. I don't know if this is the same problem you are facing but, if you are using TLS, the connection string used by the readiness probe requires the TLS certificates to be valid and match the name of the service.

My assumption is the agent-health-status.json file is not written to if the probe never connects to the service successfully in the first place.

balait4 · Answer 3 · Thu Jun 01 2023 02:44:52 GMT+0800 (China Standard Time)

thanks for reply, my case still I didn't enable/configure TLS. I build my own image version 4.2 and 6.0, with this I'm getting this issue. If I use docker.io/mongo:4.2.6 or 6.0.6 then it is working fine.

georgy charkseliani · Answer 4 · Sat Jun 10 2023 03:55:14 GMT+0800 (China Standard Time)

@balait4 had the same issue with mongodb ver. 6.0.5 when setting the number of members to either 1 or 2.. Setting it to 3 fixed the issue.

What's your ReplicaSet member number?

balait4 · Answer 5 · Sat Jun 10 2023 04:23:15 GMT+0800 (China Standard Time)

thanks Yeah even for me it is working. But any idea that if I update the MongoDBCommunity resource for any changes the operator is not straight way do the reconcile, i need to delete the statefulset then it will create teh stateful set with new configuration. Is this expected?

georgy charkseliani · Answer 6 · Sat Jun 10 2023 05:00:54 GMT+0800 (China Standard Time)

@balait4 I guess it depends on the changes you want to implement. E.g. for the ReplicaSet number, once you modify the manifest, save it and do kubectl apply, it should work.

But if you want to add something like certificates, you'd need to provision other resources and hence deleting and creating the Statefulset is the way to go.

Still it shouldn't be a problem as you keep the PVCs, so once your db is back everything should be back to normal.

balait4 · Answer 7 · Sat Jun 10 2023 05:03:01 GMT+0800 (China Standard Time)

Thanks for the reply!

github-actions · Answer 8 · Wed Aug 09 2023 09:52:11 GMT+0800 (China Standard Time)

This issue is being marked stale because it has been open for 60 days with no activity. Please comment if this issue is still affecting you. If there is no change, this issue will be closed in 30 days.

code-URI · Answer 9 · Mon Aug 14 2023 15:34:20 GMT+0800 (China Standard Time)

I am having the same issue with mongo:6.0.8. I changed the image version to mongo:6.0.6 problem resolved.

Could this be an issue with image mongo:6.0.8? Or operator issue?

code-URI · Answer 10 · Mon Aug 14 2023 15:46:28 GMT+0800 (China Standard Time)

I think the issue is with operator. when I completely deleted the replicates and redeploy it problem still occurs with mongo:6.0.6 as well.

when patched the deployment with new mongo image the everything is working fine.

trick is to change the mongodb version and apply the changes without deleting the deployment.

Nam Nguyen · Answer 11 · Tue Aug 22 2023 22:21:05 GMT+0800 (China Standard Time)

Hey, this relates to the readinessProbe used by the operator to define readiness of the pods.

It should be a red-herring and should be mostly fine since the readinessProbe eventually recovers.

Having said that, this has been fixed in newer versions (starting 1.0.15 IIRC). The operator sources the readinessProbe version from an environment variable as seen in this helm-chart: https://github.com/mongodb/helm-charts/blob/a9cd1a8945ab98dfdc6e1f99c169822a6dacd7ab/charts/community-operator/values.yaml#L67

PR: #1224

I am closing this issue.

If there are issues with the readinessProbe marking the pod as unready while it should be ready it is most likely not because of above reason but something else.