Current mongo db members is 0 though >0 configured and running

Question

Current mongo db members is 0 though >0 configured and running

janluak opened this issue 8 months ago · comments

Hey guys,
thanks for the help in advance!

What did you do to encounter the bug?
Interestingly, I have this issue only in one environment but in another cluster it works perfectly fine → I cannot define the difference but only search for it.

The manifest looks like this:

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: db-mongodb
  namespace: db-mongodb
  annotations:
    argocd.argoproj.io/sync-wave: "5"
#  labels:
#    instance: mongodb
#    name: db-mongodb
#    app: db-mongodb-svc
spec:
  members: 1
  type: ReplicaSet
  version: 7.0.3
  statefulSet:
    spec:
#      metadata:
#        labels:
#          instance: mongodb
#          name: db-mongodb
#          app: db-mongodb-svc
#      selector:
#        matchLabels:
#          instance: mongodb
#          name: db-mongodb
#          app: db-mongodb-svc
      template:
#        labels:
#          instance: mongodb
#          name: db-mongodb
#          app: db-mongodb-svc
        spec:
          containers:
            - name: mongod
              resources: 
                limits:
                  cpu: "1"
                  memory: 500M
                requests:
                  cpu: 250m
                  memory: 300M
            - name: mongodb-agent
              resources: 
                limits:
                  cpu: "1"
                  memory: 500M
                requests:
                  cpu: 250m
                  memory: 300M
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
        - metadata:
            name: logs-volume
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 500Mi
  security:
    authentication:
      modes: ["SCRAM-SHA-1", "SCRAM"]
  users:
    - name: initial-admin
      db: admin
      passwordSecretRef:
          name: my-secret
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
        - name: dbAdminAnyDatabase
          db: admin
        - name: readWriteAnyDatabase
          db: admin
        - name: backup
          db: admin
        - name: restore
          db: admin
      scramCredentialsSecretName: scrams

I tried already playing with the labels (see #904) but unfortunately it didn't help.

What did you expect?
When inspecting the resource mongodbcommunity the status field current mongo db members should equal the number of live and ready pods from the correlating stateful set.

What happened instead?
The pods are running and the stateful set is ready - the mongodbcommunity is still in state pending.
This leaves the undesired behavior of not being able to connect to it via the mongodb+svc url.

Interestingly, when creating or deleting the mongodbcommunity resource the stateful set is correctly created or deleted as well → the operator seems to do its job correctly. On the other side the check on the stateful set of the resource seems to fail.

Output

kubectl describe mongodbcommunity/db-mongodb
- - -
Name:         db-mongodb
Namespace:    db-mongodb
Labels:       app=db-mongodb-svc
              app.kubernetes.io/managed-by=Helm
              instance=mongodb
              name=db-mongodb
Annotations:  argocd.argoproj.io/sync-wave: 5
              meta.helm.sh/release-name: mongodb
              meta.helm.sh/release-namespace: db-mongodb
API Version:  mongodbcommunity.mongodb.com/v1
Kind:         MongoDBCommunity
Metadata:
  Creation Timestamp:  2023-11-22T09:54:40Z
  Generation:          1
  Resource Version:    20129540
  UID:                 f60310d6-4f74-49ca-89ba-4e1270145437
Spec:
  Members:  1
  Security:
    Authentication:
      Ignore Unknown Users:  true
      Modes:
        SCRAM-SHA-1
        SCRAM
  Stateful Set:
    Spec:
      Metadata:
        Labels:
          App:       db-mongodb-svc
          Instance:  mongodb
          Name:      db-mongodb
      Selector:
        Match Labels:
          App:       db-mongodb-svc
          Instance:  mongodb
          Name:      db-mongodb
      Template:
        Metadata:
          Labels:
            App:       db-mongodb-svc
            Instance:  mongodb
            Name:      db-mongodb
        Spec:
          Containers:
            Name:  mongod
            Resources:
              Limits:
                Cpu:     1
                Memory:  500M
              Requests:
                Cpu:     250m
                Memory:  300M
            Name:        mongodb-agent
            Resources:
              Limits:
                Cpu:     1
                Memory:  500M
              Requests:
                Cpu:     250m
                Memory:  300M
      Volume Claim Templates:
        Metadata:
          Name:  data-volume
        Spec:
          Access Modes:
            ReadWriteOnce
          Resources:
            Requests:
              Storage:  5Gi
        Metadata:
          Name:  logs-volume
        Spec:
          Access Modes:
            ReadWriteOnce
          Resources:
            Requests:
              Storage:  500Mi
  Type:                 ReplicaSet
  Users:
    Db:    admin
    Name:  initial-admin
    Password Secret Ref:
      Name:  my-secret
    Roles:
      Db:                           admin
      Name:                         clusterAdmin
      Db:                           admin
      Name:                         userAdminAnyDatabase
      Db:                           admin
      Name:                         dbAdminAnyDatabase
      Db:                           admin
      Name:                         readWriteAnyDatabase
      Db:                           admin
      Name:                         backup
      Db:                           admin
      Name:                         restore
    Scram Credentials Secret Name:  scrams
  Version:                          7.0.3
Status:
  Current Mongo DB Members:       0
  Current Stateful Set Replicas:  0
  Message:                        ReplicaSet is not yet ready, retrying in 10 seconds
  Mongo Uri:
  Phase:                          Pending
Events:                           <none>

kubectl describe statefulset/db-mongodb
- - -
Name:               db-mongodb
Namespace:          db-mongodb
CreationTimestamp:  Wed, 22 Nov 2023 10:54:40 +0100
Selector:           app=db-mongodb-svc,instance=mongodb,name=db-mongodb
Labels:              app=db-mongodb-svc,instance=mongodb,name=db-mongodb
Annotations:        <none>
Replicas:           1 desired | 1 total
Update Strategy:    RollingUpdate
Pods Status:        1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=db-mongodb-svc
                    instance=mongodb
                    name=db-mongodb
  Service Account:  mongodb-database
  Init Containers:
   mongod-posthook:
    Image:      quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.7
    Port:       <none>
    Host Port:  <none>
    Command:
      cp
      version-upgrade-hook
      /hooks/version-upgrade
    Environment:  <none>
    Mounts:
      /hooks from hooks (rw)
   mongodb-agent-readinessprobe:
    Image:      quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.15
    Port:       <none>
    Host Port:  <none>
    Command:
      cp
      /probes/readinessprobe
      /opt/scripts/readinessprobe
    Environment:  <none>
    Mounts:
      /opt/scripts from agent-scripts (rw)
  Containers:
   mongod:
    Image:      docker.io/mongo:7.0.3
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c

      #run post-start hook to handle version changes
      /hooks/version-upgrade

      # wait for config and keyfile to be created by the agent
       while ! [ -f /data/automation-mongod.conf -a -f /var/lib/mongodb-mms-automation/authentication/keyfile ]; do sleep 3 ; done ; sleep 2 ;

      # start mongod with this configuration
      exec mongod -f /data/automation-mongod.conf;


    Args:

    Limits:
      cpu:     1
      memory:  500M
    Requests:
      cpu:     250m
      memory:  300M
    Environment:
      AGENT_STATUS_FILEPATH:  /healthstatus/agent-health-status.json
    Mounts:
      /data from data-volume (rw)
      /healthstatus from healthstatus (rw)
      /hooks from hooks (rw)
      /tmp from tmp (rw)
      /var/lib/mongodb-mms-automation/authentication from db-mongodb-keyfile (rw)
      /var/log/mongodb-mms-automation from logs-volume (rw)
   mongodb-agent:
    Image:      quay.io/mongodb/mongodb-agent:12.0.24.7719-1
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
      current_uid=$(id -u)
      AGENT_API_KEY="$(cat /mongodb-automation/agent-api-key/agentApiKey)"
      declare -r current_uid
      if ! grep -q "${current_uid}" /etc/passwd ; then
      sed -e "s/^mongodb:/builder:/" /etc/passwd > /tmp/passwd
      echo "mongodb:x:$(id -u):$(id -g):,,,:/:/bin/bash" >> /tmp/passwd
      export NSS_WRAPPER_PASSWD=/tmp/passwd
      export LD_PRELOAD=libnss_wrapper.so
      export NSS_WRAPPER_GROUP=/etc/group
      fi
      agent/mongodb-agent -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json -serveStatusPort=5000 -cluster=/var/lib/automation/config/cluster-config.json -skipMongoStart -noDaemonize -useLocalMongoDbTools -logFile ${AGENT_LOG_FILE} -maxLogFileDurationHrs ${AGENT_MAX_LOG_FILE_DURATION_HOURS} -logLevel ${AGENT_LOG_LEVEL}
    Limits:
      cpu:     1
      memory:  500M
    Requests:
      cpu:      250m
      memory:   300M
    Readiness:  exec [/opt/scripts/readinessprobe] delay=5s timeout=1s period=10s #success=1 #failure=40
    Environment:
      AGENT_LOG_FILE:                     /var/log/mongodb-mms-automation/automation-agent.log
      AGENT_LOG_LEVEL:                    INFO
      AGENT_MAX_LOG_FILE_DURATION_HOURS:  24
      AGENT_STATUS_FILEPATH:              /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
      AUTOMATION_CONFIG_MAP:              db-mongodb-config
      HEADLESS_AGENT:                     true
      POD_NAMESPACE:                       (v1:metadata.namespace)
    Mounts:
      /data from data-volume (rw)
      /opt/scripts from agent-scripts (rw)
      /tmp from tmp (rw)
      /var/lib/automation/config from automation-config (ro)
      /var/lib/mongodb-mms-automation/authentication from db-mongodb-keyfile (rw)
      /var/log/mongodb-mms-automation from logs-volume (rw)
      /var/log/mongodb-mms-automation/healthstatus from healthstatus (rw)
  Volumes:
   agent-scripts:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   automation-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  db-mongodb-config
    Optional:    false
   db-mongodb-keyfile:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   healthstatus:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   hooks:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Volume Claims:
  Name:          data-volume
  StorageClass:
  Labels:        <none>
  Annotations:   <none>
  Capacity:      5Gi
  Access Modes:  [ReadWriteOnce]
  Name:          logs-volume
  StorageClass:
  Labels:        <none>
  Annotations:   <none>
  Capacity:      500Mi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age                 From                    Message
  ----    ------            ----                ----                    -------
  Normal  InjectionSkipped  14m (x12 over 14m)  linkerd-proxy-injector  Linkerd sidecar proxy injection skipped: neither the namespace nor the pod have the annotation "linkerd.io/inject:enabled"
  Normal  SuccessfulCreate  14m                 statefulset-controller  create Pod db-mongodb-0 in StatefulSet db-mongodb successful

Operator Information

I have following versions installed:

community-operator-crds: 0.8.3 (tried with 0.9.0 as well)
community-operator: 0.8.3
MongoDB: 6.0.8 (tried 7.0.3 as well)

Kubernetes Cluster Information

Distribution: k3s
Version: 1.28.2
Image Registry location: quay

Question
How does the mongodbcommunity resource select the stateful set? Are there any defaults which I should set since they got lost for some reason?

janluak · Answer 1 · Wed Nov 22 2023 18:28:29 GMT+0800 (China Standard Time)

btw: the reference of replica set in the status message is throwing me of a bit. Shouldn't it be the stateful set?

Current Stateful Set Replicas:  0
Message:                        ReplicaSet is not yet ready, retrying in 10 seconds

janluak · Answer 2 · Wed Nov 22 2023 18:36:25 GMT+0800 (China Standard Time)

Not sure if connected or just another issue: when trying to tell vault to write a database config (even for one of the running pods directly instead of the srv) it returns this error:

current topology: { Type: ReplicaSetNoPrimary, Servers: [...]}

janluak · Answer 3 · Mon Nov 27 2023 17:54:05 GMT+0800 (China Standard Time)

I just deleted the persistent volumes and correlating claims - it works...
I have no clue what the storage has to do with the mismatch between the mongodbcommunity resource and the stateful set.

For the moment it is working but it would be highly appreciated if somebody could shed some light on this ;)

Rajdeep Das · Answer 4 · Wed Nov 29 2023 19:35:32 GMT+0800 (China Standard Time)

Hi @janluak - do you happen to have the operator logs when the MongoDB community was reporting to be in Pending state?

janluak · Answer 5 · Wed Nov 29 2023 20:54:29 GMT+0800 (China Standard Time)

unfortunately, no. ever since the deleting of the PVs solved it the mongodb is running as desired. The logs weren't kept that long :/

Rajdeep Das · Answer 6 · Thu Nov 30 2023 18:42:59 GMT+0800 (China Standard Time)

got it - it's pretty hard to debug without the operator logs. I am going to close this issue for now, if you encounter it again, feel free to re-open the issue with the operator logs.