reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.

Home Page:https://www.kubegres.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong PVC node

coderazzi opened this issue · comments

Hi,
I have been using kubegres 1.15 since January, on a AWS EKS cluster, with a one replica topology (backups to a EFS volume)
In August, after a faulty EKS upgrade, I had to restart the whole cluster and recovered manually my databases from a previous backup.
This created two pods: db-kubegres-1-0 (main) and db-kubegres-2-0 (replica).
At some moment, the main instance failed, the replica was promoted, and a new replica created.

Curiously, db-kubegres-3-0 was created in a given node (ip-192-168-12-87.eu-west-2.compute.internal),
but the associated PVC (postgres-db-db-kubegres-3-0) was created with the following metadata annotation:
volume.kubernetes.io/selected-node: ip-192-168-69-151.eu-west-2.compute.internal

So, wrong node; in fact, this node is not in my cluster, although I logically assume it refers to a node that belonged to my cluster before.

The problem is that the PVC spawned then a PV on a zone (eu-west-2c), different from the zone where the POD was allocated (eu-west-2a).
The result was that the POD failed to start: 0/2 nodes are available: 2 node(s) had volume node affinity conflict.

Removing the POD would recreate it again, reusing the same PVC, and failing in the same mode.
I had to manually recreate the PVC with the correct annotation, and then restart the POD to have it working again.