How to schedule replicas and persistent volume in different availability zones

Question

How to schedule replicas and persistent volume in different availability zones

sebinnsebastiann opened this issue 5 months ago · comments

What did you do to encounter the bug?
Steps to reproduce the behavior:
I have a AWS EKS cluster and I deployed MongoDB replica set using operator version 0.9.0. Replica set deployment file I used is given below,

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-test
  namespace: mongodb-test
spec:
  members: 3
  type: ReplicaSet
  version: "7.0.0"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
    - name: mongouser
      db: admin
      passwordSecretRef: # a reference to the secret that will be used to generate the user's password
        name: password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: mongodb-scram
  statefulSet:
    spec:
      template:
        spec:
          nodeSelector:
            server: mongo
          # resources can be specified by applying an override
          # per container name.
          containers:
            - name: mongod
              resources:
                limits:
                  cpu: "0.3"
                  memory: 700M
                requests:
                  cpu: "0.2"
                  memory: 500M
            - name: mongodb-agent
              resources:
                limits:
                  cpu: "0.2"
                  memory: 500M
                requests:
                  cpu: "0.1"
                  memory: 250M
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            resources:
              storageClassName: sc1
              requests:
                storage: 10Gi
        - metadata:
            name: logs-volume
          spec:
            storageClassName: sc1
            resources:
              requests:
                storage: 2Gi

Below is the command I used to deploy mongodb,

kubectl apply -f mongodb-kubernetes-operator/crd/mongodbcommunity.mongodb.com_mongodbcommunity.yaml

kubectl get crd/mongodbcommunity.mongodbcommunity.mongodb.com

kubectl apply -k mongodb-kubernetes-operator/rbac/ --namespace mongodb-test

kubectl create -f mongodb-kubernetes-operator/manager/manager.yaml --namespace mongodb-test

kubectl apply -f replicaset/rbac -n mongodb-test

kubectl apply -f replicaset/replica-set.yaml -n mongodb-test

What did you expect?

I want mongodb replicas to schedule on different availability zones.
Schedule the replica in same availability zone as replica's pv.
eg: replica1, replica1's persistent volume in eu-west-1a
replica2, replica2's persistent volume in eu-west-1b
replica, replica3's persistent volume in eu-west-1c

What happened instead?
All the replicas and all of it's pv scheduled in same availability zone.

Operator Information

Operator Version: v0.9.0
MongoDB Image used: quay.io/mongodb/mongodb-community-server:7.0.0-ubi8

Kubernetes Cluster Information

Distribution: AWS EKS
Version: v1.26.11-eks-8cb36c9

Catalin Codreanu · Answer 1 · Mon Jan 22 2024 16:14:19 GMT+0800 (China Standard Time)

Hey, you can pass on spec.statefulSet.spec.template.spec.affinity.podAntiAffinity in the mongodb resource, or even topologySpreadConstraints.

Sebinn Sebastian · Answer 2 · Tue Jan 23 2024 21:30:36 GMT+0800 (China Standard Time)

But after a POD restart, The POD schedules on az1 but its persistent volume is in az2. Then it can cause a volume node affinity error. How can we tackle this issue?

Catalin Codreanu · Answer 3 · Mon Jan 29 2024 17:38:45 GMT+0800 (China Standard Time)

kube-scheduler accounts for the fact that the volume is in a specific az and will schedule the pod in the correct az

Sebinn Sebastian · Answer 4 · Tue Jan 30 2024 13:03:14 GMT+0800 (China Standard Time)

If a pod is rescheduled, deleted and recreated, or an instance where the pod was running is terminated then a pod reuses an existing EBS volume there is still a chance that the pod will be scheduled in an AZ where the EBS volume doesn’t exist.

I'm getting volume node affinity error in above scenario. Kube-schedule won't be able to schedule the pod in the correct az where the volume exist.

github-actions · Answer 5 · Sun Mar 31 2024 09:49:10 GMT+0800 (China Standard Time)

This issue is being marked stale because it has been open for 60 days with no activity. Please comment if this issue is still affecting you. If there is no change, this issue will be closed in 30 days.

github-actions · Answer 6 · Wed May 01 2024 09:50:03 GMT+0800 (China Standard Time)

This issue was closed because it became stale and did not receive further updates. If the issue is still affecting you, please re-open it, or file a fresh Issue with updated information.