mongodb / mongodb-kubernetes-operator

MongoDB Community Kubernetes Operator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mongodb-agent container is not comming into ready state -

balram1988 opened this issue · comments

I have deployed on Mongodb community operator on eks cluster. Now I was trying to deploy the mongodb cluster with 2 replicas. I have taken the reference for sample Yamls. Here is my yaml which I am trying to install the cluster.

---
apiVersion: v1
kind: Secret
metadata:
  name: mongodb-user-password
  namespace: dev-3pp-mongodb
type: Opaque
stringData:
  password: admin
---
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-community
  namespace: dev-3pp-mongodb
spec:
  members: 2
  type: ReplicaSet
  version: 6.0.5
  security:
    authentication:
      modes:
        - SCRAM
  users:
    - name: mongodb-user
      db: admin
      passwordSecretRef:
        name: mongodb-user-password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: my-scram
  statefulSet:
    spec:
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            selector:
              matchLabels:
                type: data
            accessModes:
              - ReadWriteOnce
            storageClassName: ai-insights-mongodb-data
            resources:
              requests:
                storage: 10Gi
        - metadata:
            name: logs-volume
          spec:
            selector:
              matchLabels:
                type: logs
            accessModes:
              - ReadWriteOnce
            storageClassName: ai-insights-mongodb-logs
            resources:
              requests:
                storage: 5Gi
      template:
        spec:
          initContainers:
            - volumeMounts:
                - mountPath: /data
                  name: data-volume
                - mountPath: /logs
                  name: logs-volume
              name: change-dir-permissions
              image: busybox
          securityContext:
            fsGroup: 2000
            fsGroupChangePolicy: OnRootMismatch
            runAsUser: 2000
            runAsNonRoot: true**
`
NAME                                           READY   STATUS    RESTARTS   AGE
**mongodb-community-0                            1/2     Running   0          20h**
mongodb-kubernetes-operator-86bbd49bc5-lf4q4   1/1     Running   0          20h

If you see the above then 2nd container which is mongodb-agen is not coming up.
After describing the pod -
Name:             mongodb-community-0
Namespace:        dev-3pp-mongodb
Priority:         0
Service Account:  mongodb-database
Node:             ip-10-101-3-34.eu-west-1.compute.internal/10.101.3.34
Start Time:       Wed, 31 Jan 2024 09:18:31 +0000
Labels:           app=mongodb-community-svc
                  controller-revision-hash=mongodb-community-7cfd597f6b
                  statefulset.kubernetes.io/pod-name=mongodb-community-0
Annotations:      agent.mongodb.com/version: -1
Status:           Running
IP:               10.101.24.33
IPs:
  IP:           10.101.24.33
Controlled By:  StatefulSet/mongodb-community
Init Containers:
  change-dir-permissions:
    Container ID:   containerd://3d78d7b896c0f6146ea15b88a33ce066f115c0bf111616a1d60f11c4878cbcbd
    Image:          busybox
    Image ID:       docker.io/library/busybox@sha256:6d9ac9237a84afe1516540f40a0fafdc86859b2141954b4d643af7066d598b74
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 31 Jan 2024 09:18:40 +0000
      Finished:     Wed, 31 Jan 2024 09:18:40 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from data-volume (rw)
      /logs from logs-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pdv4x (ro)
  mongod-posthook:
    Container ID:  containerd://9083dffeee3d289820b2e6ec7ab108166adce9fba3d6e36e017ed80cd3cd4e56
    Image:         quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.8
    Image ID:      quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook@sha256:3a6bb3fd6807964ead036749dbbb0ccff2189ecf3059855a362217a4b0e43603
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      version-upgrade-hook
      /hooks/version-upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 31 Jan 2024 09:18:44 +0000
      Finished:     Wed, 31 Jan 2024 09:18:44 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /hooks from hooks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pdv4x (ro)
  mongodb-agent-readinessprobe:
    Container ID:  containerd://66324d10e3326e5385ab111dc530e31133476dee090574e06af453946abe2bf0
    Image:         quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.17
    Image ID:      quay.io/mongodb/mongodb-kubernetes-readinessprobe@sha256:5d6745fcf5b29a2098b634d06cbb070db25403f1346588533cc2d31c0c91cfab
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /probes/readinessprobe
      /opt/scripts/readinessprobe
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 31 Jan 2024 09:18:47 +0000
      Finished:     Wed, 31 Jan 2024 09:18:47 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/scripts from agent-scripts (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pdv4x (ro)
Containers:
  mongod:
    Container ID:  containerd://365ebaeae057209db85f708003c211acd1accac9b7a8e5c4733eed0d3964b100
    Image:         docker.io/mongo:6.0.5
    Image ID:      docker.io/library/mongo@sha256:928347070dc089a596f869a22a4204c0feace3eb03470a6a2de6814f11fb7309
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      
      #run post-start hook to handle version changes
      /hooks/version-upgrade
      
      # wait for config and keyfile to be created by the agent
       while ! [ -f /data/automation-mongod.conf -a -f /var/lib/mongodb-mms-automation/authentication/keyfile ]; do sleep 3 ; done ; sleep 2 ;
      
      # start mongod with this configuration
      exec mongod -f /data/automation-mongod.conf;
      
      
    Args:
      
    State:          Running
      Started:      Wed, 31 Jan 2024 09:19:01 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  500M
    Requests:
      cpu:     500m
      memory:  400M
    Environment:
      AGENT_STATUS_FILEPATH:  /healthstatus/agent-health-status.json
    Mounts:
      /data from data-volume (rw)
      /healthstatus from healthstatus (rw)
      /hooks from hooks (rw)
      /tmp from tmp (rw)
      /var/lib/mongodb-mms-automation/authentication from mongodb-community-keyfile (rw)
      /var/log/mongodb-mms-automation from logs-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pdv4x (ro)
  mongodb-agent:
    Container ID:  containerd://cb01d8f53095f6cd6abc0563e02be27fd0a071b07e9c44abc786603fdb1adc93
    Image:         quay.io/mongodb/mongodb-agent:107.0.0.8465-1
    Image ID:      quay.io/mongodb/mongodb-agent@sha256:a208e80f79bb7fe954d9a9a1444bb482dee2e86e5e5ae89dbf240395c4a158b3
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      current_uid=$(id -u)
      AGENT_API_KEY="$(cat /mongodb-automation/agent-api-key/agentApiKey)"
      declare -r current_uid
      if ! grep -q "${current_uid}" /etc/passwd ; then
      sed -e "s/^mongodb:/builder:/" /etc/passwd > /tmp/passwd
      echo "mongodb:x:$(id -u):$(id -g):,,,:/:/bin/bash" >> /tmp/passwd
      export NSS_WRAPPER_PASSWD=/tmp/passwd
      export LD_PRELOAD=libnss_wrapper.so
      export NSS_WRAPPER_GROUP=/etc/group
      fi
      agent/mongodb-agent -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json -serveStatusPort=5000 -cluster=/var/lib/automation/config/cluster-config.json -skipMongoStart -noDaemonize -useLocalMongoDbTools -logFile ${AGENT_LOG_FILE} -maxLogFileDurationHrs ${AGENT_MAX_LOG_FILE_DURATION_HOURS} -logLevel ${AGENT_LOG_LEVEL}
    State:          Running
      Started:      Wed, 31 Jan 2024 09:19:08 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  500M
    Requests:
      cpu:      500m
      memory:   400M
    Readiness:  exec [/opt/scripts/readinessprobe] delay=5s timeout=1s period=10s #success=1 #failure=40
    Environment:
      AGENT_LOG_FILE:                     /var/log/mongodb-mms-automation/automation-agent.log
      AGENT_LOG_LEVEL:                    INFO
      AGENT_MAX_LOG_FILE_DURATION_HOURS:  24
      AGENT_STATUS_FILEPATH:              /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
      AUTOMATION_CONFIG_MAP:              mongodb-community-config
      HEADLESS_AGENT:                     true
      POD_NAMESPACE:                      dev-3pp-mongodb (v1:metadata.namespace)
    Mounts:
      /data from data-volume (rw)
      /opt/scripts from agent-scripts (rw)
      /tmp from tmp (rw)
      /var/lib/automation/config from automation-config (ro)
      /var/lib/mongodb-mms-automation/authentication from mongodb-community-keyfile (rw)
      /var/log/mongodb-mms-automation from logs-volume (rw)
      /var/log/mongodb-mms-automation/healthstatus from healthstatus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pdv4x (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  logs-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  logs-volume-mongodb-community-0
    ReadOnly:   false
  data-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-volume-mongodb-community-0
    ReadOnly:   false
  agent-scripts:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  automation-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  mongodb-community-config
    Optional:    false
  healthstatus:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  hooks:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  mongodb-community-keyfile:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-pdv4x:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Other information -
Operator Version - 0.9.0
MongoDB Image used - 6.0.5

Kubernetes Cluster Information
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:44:59Z", GoVersion:"go1.19", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.8-eks-8cb36c9", GitCommit:"fca3a8722c88c4dba573a903712a6feaf3c40a51", GitTreeState:"clean", BuildDate:"2023-11-22T21:52:13Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}

agent-health-status.json
automation-agent-verbose.log
automation-agent.log
cluster-config.json

Any help would be appreciated.

@balram1988 can your agents connect to the mongod cluster?

  • ssh into the mongodb-agent container
  • curl the mongod endpoint
bash-4.4$ curl <pod-name>.<service-name>.<namespace>.svc.cluster.local:27017
It looks like you are trying to access MongoDB over HTTP on the native driver port.

in your example it should be:

mongodb-community-0.mongodb-community-svc.dev-3pp-mongodb.svc.cluster.local:27017
  • verify that your example works without security like auth and user creation

@nammn thanks for a quick reply.

I tried to run the above command inside the agent-container, but curl was not installed.

But I tried to connect the same from operator pod but not output is there. -

AWSReservedSSO_SWE-Dev_Cloud9_Role_75beb0b0ead7f802:~/environment $ curl mongodb-community-0.mongodb-community-svc.dev-3pp-mongodb.svc.cluster.local:27017
curl: (6) Could not resolve host: mongodb-community-0.mongodb-community-svc.dev-3pp-mongodb.svc.cluster.local

AWSReservedSSO_SWE-Dev_Cloud9_Role_75beb0b0ead7f802:~/environment $ k get svc
NAME                    TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)     AGE
mongodb-community-svc   ClusterIP   None         <none>        27017/TCP   44h

AWSReservedSSO_SWE-Dev_Cloud9_Role_75beb0b0ead7f802:~/environment $ k exec -it mongodb-kubernetes-operator-86bbd49bc5-lf4q4 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.

bash-4.4$ curl mongodb-community-svc:27017
It looks like you are trying to access MongoDB over HTTP on the native driver port.

bash-4.4$ curl mongodb-community-0.mongodb-community-svc.dev-3pp-mongodb.svc.cluster.local:27017
It looks like you are trying to access MongoDB over HTTP on the native driver port.

Regarding you second feedaback, sure i can give a try on my side and will update soon.

@nammn : FYI, I just checked without auth and user creation then it is not deploying the cluster.

MongoDBCommunity/dev-3pp-mongodb/mongodb-community dry-run failed, reason: Invalid: MongoDBCommunity.mongodbcommunity.mongodb.com "mongodb-community" is invalid: [spec.security: Required value, spec.users: Required value]

Are you sure you were testing the correct agent image, the agent image should have curl:

❯ docker run -it --platform linux/amd64 quay.io/mongodb/mongodb-agent:107.0.0.8465-1  bash
Unable to find image 'quay.io/mongodb/mongodb-agent:107.0.0.8465-1' locally
107.0.0.8465-1: Pulling from mongodb/mongodb-agent
7a2c55901189: Pull complete
6cff85196383: Pull complete
1b5e0859ae65: Pull complete
b760e4e5a1d0: Pull complete
2bcdef562043: Pull complete
6ec9fa8aa6b5: Pull complete
25daf17bced4: Pull complete
cfd648f214ed: Pull complete
Digest: sha256:a208e80f79bb7fe954d9a9a1444bb482dee2e86e5e5ae89dbf240395c4a158b3
Status: Downloaded newer image for quay.io/mongodb/mongodb-agent:107.0.0.8465-1
I have no name!@01821f387086:/$ curl
curl: try 'curl --help' or 'curl --manual' for more information

but yeah it seems that the svc is correctly configured pointing to the pods.
Can you create a minimal reproducible example? Removing everything until it stops working.

Below is a working example from me:

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongo
  namespace: nnguyen-evg-single
spec:
  members: 3
  type: ReplicaSet
  version: "7.0.4"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
    - name: admin
      db: admin
      passwordSecretRef:
        name: admin-password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: admin-scram

# the user credentials will be generated from this secret
# once the credentials are generated, this secret is no longer required
---
apiVersion: v1
kind: Secret
metadata:
  name: admin-password
  namespace: nnguyen-evg-single
type: Opaque
stringData:
  password: "mongodb"

I tried the same above example before and now I tried again but pod is in pending state due to pvc is pending which is wating for the pv to be attached.


AWSReservedSSO_SWE-Dev_Cloud9_Role_75beb0b0ead7f802:~/environment $ k get pods 
NAME                                           READY   STATUS    RESTARTS   AGE
mongodb-0                                      0/2     Pending   0          6m9s
mongodb-kubernetes-operator-86bbd49bc5-prpvr   1/1     Running   0          20m
AWSReservedSSO_SWE-Dev_Cloud9_Role_75beb0b0ead7f802:~/environment $ k get pvc
NAME                    STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-volume-mongodb-0   Pending                                                     6m18s
logs-volume-mongodb-0   Pending                                                     6m18s 

I cant depoy on my eaa-dev cluster without pvc configuration because we can't use cluster resources directly. I tried the same with my local on minikube and that worked for me.

FYI, Here are the logs from mongodb-agent container which is not coming on my eaa-dev cluster which is running on kubernetes and managed by ACS-

AWSReservedSSO_SWE-Dev_Cloud9_Role_75beb0b0ead7f802:~/environment $ k logs -f mongodb-community-0 -c mongodb-agent
cat: /mongodb-automation/agent-api-key/agentApiKey: No such file or directory
[2024-02-05T09:21:43.251+0000] [.debug] [util/distros/distros.go:LinuxFlavorAndVersionUncached:144] Detected linux flavor ubuntu version 20.4 

Regarding the image which is trying to deploy from the pod -

AWSReservedSSO_SWE-Dev_Cloud9_Role_75beb0b0ead7f802:~/environment $ kdp mongodb-community-0 | grep -i image Image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.8 Image ID: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook@sha256:b668e517f9c54ee64b8fefdc042ae4edbe5f40f900152f4b1abcdeb4f1a72462 Image: quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.17 Image ID: quay.io/mongodb/mongodb-kubernetes-readinessprobe@sha256:560bb12acb24c66c19ebcc01a7c39736d1fc505eff14588029485e662872f694 Image: docker.io/mongo:7.0.4 Image ID: docker.io/library/mongo@sha256:d14158139a0bbc1741136d3eded7bef018a5980760a57f0014a1d4ac7677e4b1 Image: quay.io/mongodb/mongodb-agent:107.0.0.8465-1 Image ID: quay.io/mongodb/mongodb-agent@sha256:a208e80f79bb7fe954d9a9a1444bb482dee2e86e5e5ae89dbf240395c4a158b3

Closing this issue, as it was occurring due to default timeout setting 1 sec for readinessProbe. After increasing the timeout to 10 secs, mongodb-agent pod came up.

@nammn : Thanks for your help.

Facing the same issue, updating the timeout to 20 seconds did not help.
I can execute the readinessprobe within the agent pod but it still does not come up