Current mongo db members is 0 though >0 configured and running
janluak opened this issue · comments
Hey guys,
thanks for the help in advance!
What did you do to encounter the bug?
Interestingly, I have this issue only in one environment but in another cluster it works perfectly fine → I cannot define the difference but only search for it.
The manifest looks like this:
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: db-mongodb
namespace: db-mongodb
annotations:
argocd.argoproj.io/sync-wave: "5"
# labels:
# instance: mongodb
# name: db-mongodb
# app: db-mongodb-svc
spec:
members: 1
type: ReplicaSet
version: 7.0.3
statefulSet:
spec:
# metadata:
# labels:
# instance: mongodb
# name: db-mongodb
# app: db-mongodb-svc
# selector:
# matchLabels:
# instance: mongodb
# name: db-mongodb
# app: db-mongodb-svc
template:
# labels:
# instance: mongodb
# name: db-mongodb
# app: db-mongodb-svc
spec:
containers:
- name: mongod
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 250m
memory: 300M
- name: mongodb-agent
resources:
limits:
cpu: "1"
memory: 500M
requests:
cpu: 250m
memory: 300M
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
- metadata:
name: logs-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
security:
authentication:
modes: ["SCRAM-SHA-1", "SCRAM"]
users:
- name: initial-admin
db: admin
passwordSecretRef:
name: my-secret
roles:
- name: clusterAdmin
db: admin
- name: userAdminAnyDatabase
db: admin
- name: dbAdminAnyDatabase
db: admin
- name: readWriteAnyDatabase
db: admin
- name: backup
db: admin
- name: restore
db: admin
scramCredentialsSecretName: scrams
I tried already playing with the labels (see #904) but unfortunately it didn't help.
What did you expect?
When inspecting the resource mongodbcommunity
the status field current mongo db members
should equal the number of live and ready pods from the correlating stateful set.
What happened instead?
The pods are running and the stateful set is ready - the mongodbcommunity is still in state pending
.
This leaves the undesired behavior of not being able to connect to it via the mongodb+svc url.
Interestingly, when creating or deleting the mongodbcommunity resource the stateful set is correctly created or deleted as well → the operator seems to do its job correctly. On the other side the check on the stateful set of the resource seems to fail.
Output
kubectl describe mongodbcommunity/db-mongodb
- - -
Name: db-mongodb
Namespace: db-mongodb
Labels: app=db-mongodb-svc
app.kubernetes.io/managed-by=Helm
instance=mongodb
name=db-mongodb
Annotations: argocd.argoproj.io/sync-wave: 5
meta.helm.sh/release-name: mongodb
meta.helm.sh/release-namespace: db-mongodb
API Version: mongodbcommunity.mongodb.com/v1
Kind: MongoDBCommunity
Metadata:
Creation Timestamp: 2023-11-22T09:54:40Z
Generation: 1
Resource Version: 20129540
UID: f60310d6-4f74-49ca-89ba-4e1270145437
Spec:
Members: 1
Security:
Authentication:
Ignore Unknown Users: true
Modes:
SCRAM-SHA-1
SCRAM
Stateful Set:
Spec:
Metadata:
Labels:
App: db-mongodb-svc
Instance: mongodb
Name: db-mongodb
Selector:
Match Labels:
App: db-mongodb-svc
Instance: mongodb
Name: db-mongodb
Template:
Metadata:
Labels:
App: db-mongodb-svc
Instance: mongodb
Name: db-mongodb
Spec:
Containers:
Name: mongod
Resources:
Limits:
Cpu: 1
Memory: 500M
Requests:
Cpu: 250m
Memory: 300M
Name: mongodb-agent
Resources:
Limits:
Cpu: 1
Memory: 500M
Requests:
Cpu: 250m
Memory: 300M
Volume Claim Templates:
Metadata:
Name: data-volume
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 5Gi
Metadata:
Name: logs-volume
Spec:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 500Mi
Type: ReplicaSet
Users:
Db: admin
Name: initial-admin
Password Secret Ref:
Name: my-secret
Roles:
Db: admin
Name: clusterAdmin
Db: admin
Name: userAdminAnyDatabase
Db: admin
Name: dbAdminAnyDatabase
Db: admin
Name: readWriteAnyDatabase
Db: admin
Name: backup
Db: admin
Name: restore
Scram Credentials Secret Name: scrams
Version: 7.0.3
Status:
Current Mongo DB Members: 0
Current Stateful Set Replicas: 0
Message: ReplicaSet is not yet ready, retrying in 10 seconds
Mongo Uri:
Phase: Pending
Events: <none>
kubectl describe statefulset/db-mongodb
- - -
Name: db-mongodb
Namespace: db-mongodb
CreationTimestamp: Wed, 22 Nov 2023 10:54:40 +0100
Selector: app=db-mongodb-svc,instance=mongodb,name=db-mongodb
Labels: app=db-mongodb-svc,instance=mongodb,name=db-mongodb
Annotations: <none>
Replicas: 1 desired | 1 total
Update Strategy: RollingUpdate
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=db-mongodb-svc
instance=mongodb
name=db-mongodb
Service Account: mongodb-database
Init Containers:
mongod-posthook:
Image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.7
Port: <none>
Host Port: <none>
Command:
cp
version-upgrade-hook
/hooks/version-upgrade
Environment: <none>
Mounts:
/hooks from hooks (rw)
mongodb-agent-readinessprobe:
Image: quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.15
Port: <none>
Host Port: <none>
Command:
cp
/probes/readinessprobe
/opt/scripts/readinessprobe
Environment: <none>
Mounts:
/opt/scripts from agent-scripts (rw)
Containers:
mongod:
Image: docker.io/mongo:7.0.3
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
#run post-start hook to handle version changes
/hooks/version-upgrade
# wait for config and keyfile to be created by the agent
while ! [ -f /data/automation-mongod.conf -a -f /var/lib/mongodb-mms-automation/authentication/keyfile ]; do sleep 3 ; done ; sleep 2 ;
# start mongod with this configuration
exec mongod -f /data/automation-mongod.conf;
Args:
Limits:
cpu: 1
memory: 500M
Requests:
cpu: 250m
memory: 300M
Environment:
AGENT_STATUS_FILEPATH: /healthstatus/agent-health-status.json
Mounts:
/data from data-volume (rw)
/healthstatus from healthstatus (rw)
/hooks from hooks (rw)
/tmp from tmp (rw)
/var/lib/mongodb-mms-automation/authentication from db-mongodb-keyfile (rw)
/var/log/mongodb-mms-automation from logs-volume (rw)
mongodb-agent:
Image: quay.io/mongodb/mongodb-agent:12.0.24.7719-1
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
current_uid=$(id -u)
AGENT_API_KEY="$(cat /mongodb-automation/agent-api-key/agentApiKey)"
declare -r current_uid
if ! grep -q "${current_uid}" /etc/passwd ; then
sed -e "s/^mongodb:/builder:/" /etc/passwd > /tmp/passwd
echo "mongodb:x:$(id -u):$(id -g):,,,:/:/bin/bash" >> /tmp/passwd
export NSS_WRAPPER_PASSWD=/tmp/passwd
export LD_PRELOAD=libnss_wrapper.so
export NSS_WRAPPER_GROUP=/etc/group
fi
agent/mongodb-agent -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json -serveStatusPort=5000 -cluster=/var/lib/automation/config/cluster-config.json -skipMongoStart -noDaemonize -useLocalMongoDbTools -logFile ${AGENT_LOG_FILE} -maxLogFileDurationHrs ${AGENT_MAX_LOG_FILE_DURATION_HOURS} -logLevel ${AGENT_LOG_LEVEL}
Limits:
cpu: 1
memory: 500M
Requests:
cpu: 250m
memory: 300M
Readiness: exec [/opt/scripts/readinessprobe] delay=5s timeout=1s period=10s #success=1 #failure=40
Environment:
AGENT_LOG_FILE: /var/log/mongodb-mms-automation/automation-agent.log
AGENT_LOG_LEVEL: INFO
AGENT_MAX_LOG_FILE_DURATION_HOURS: 24
AGENT_STATUS_FILEPATH: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
AUTOMATION_CONFIG_MAP: db-mongodb-config
HEADLESS_AGENT: true
POD_NAMESPACE: (v1:metadata.namespace)
Mounts:
/data from data-volume (rw)
/opt/scripts from agent-scripts (rw)
/tmp from tmp (rw)
/var/lib/automation/config from automation-config (ro)
/var/lib/mongodb-mms-automation/authentication from db-mongodb-keyfile (rw)
/var/log/mongodb-mms-automation from logs-volume (rw)
/var/log/mongodb-mms-automation/healthstatus from healthstatus (rw)
Volumes:
agent-scripts:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
automation-config:
Type: Secret (a volume populated by a Secret)
SecretName: db-mongodb-config
Optional: false
db-mongodb-keyfile:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
healthstatus:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
hooks:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Volume Claims:
Name: data-volume
StorageClass:
Labels: <none>
Annotations: <none>
Capacity: 5Gi
Access Modes: [ReadWriteOnce]
Name: logs-volume
StorageClass:
Labels: <none>
Annotations: <none>
Capacity: 500Mi
Access Modes: [ReadWriteOnce]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal InjectionSkipped 14m (x12 over 14m) linkerd-proxy-injector Linkerd sidecar proxy injection skipped: neither the namespace nor the pod have the annotation "linkerd.io/inject:enabled"
Normal SuccessfulCreate 14m statefulset-controller create Pod db-mongodb-0 in StatefulSet db-mongodb successful
Operator Information
I have following versions installed:
- community-operator-crds: 0.8.3 (tried with 0.9.0 as well)
- community-operator: 0.8.3
- MongoDB: 6.0.8 (tried 7.0.3 as well)
Kubernetes Cluster Information
- Distribution: k3s
- Version: 1.28.2
- Image Registry location: quay
Question
How does the mongodbcommunity resource select the stateful set? Are there any defaults which I should set since they got lost for some reason?
btw: the reference of replica set
in the status message is throwing me of a bit. Shouldn't it be the stateful set?
Current Stateful Set Replicas: 0
Message: ReplicaSet is not yet ready, retrying in 10 seconds
Not sure if connected or just another issue: when trying to tell vault to write a database config (even for one of the running pods directly instead of the srv) it returns this error:
current topology: { Type: ReplicaSetNoPrimary, Servers: [...]}
I just deleted the persistent volumes and correlating claims - it works...
I have no clue what the storage has to do with the mismatch between the mongodbcommunity resource and the stateful set.
For the moment it is working but it would be highly appreciated if somebody could shed some light on this ;)
Hi @janluak - do you happen to have the operator logs when the MongoDB community was reporting to be in Pending
state?
unfortunately, no. ever since the deleting of the PVs solved it the mongodb is running as desired. The logs weren't kept that long :/
got it - it's pretty hard to debug without the operator logs. I am going to close this issue for now, if you encounter it again, feel free to re-open the issue with the operator logs.