[BUG] Assigned Replicas on VM Nodes keep flickering around
gogo199432 opened this issue · comments
Describe the bug
The assigned volumes are constantly flickering between two of my nodes for no apparent reason. All nodes are NixOS boxes, the only difference is that those that are static are actual physical PCs, while the other two are VMs under Proxmox.
This issue doesn't seem to cause any operational issue for the pods themselves, but it is still a bit worrying.
2024-06-13.16-57-06.mp4
To Reproduce
Not sure about reproduction, this is just a standard install through kubectl apply as described in the docs
Expected behavior
Assigned volumes shouldn't be changing constantly
Support bundle for troubleshooting
supportbundle_76ddf2d1-9b00-4150-8989-cf4f91d61dce_2024-06-13T15-14-38Z.zip
Environment
- Longhorn version: 1.6.2
- Impacted volume (PV): Example: pvc-9446eddc-d0b0-459a-967a-b67e569b1e21 . However many PVs are impacted
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s v1.30.0+k3s1
- Number of control plane nodes in the cluster: 3
- Number of worker nodes in the cluster: 4
- Node config
- OS type and version: NixOS 24.05
- Kernel version: 6.6.32
- CPU per node: 4
- Memory per node: 4GB
- Disk type (e.g. SSD/NVMe/HDD): SSD
- Network bandwidth between the nodes (Gbps): 1
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM and Baremetal (depending on node)
- Number of Longhorn volumes in the cluster: 17
The nodes nixk3s-vm-agent1
and nixk3s-vm-agent2
have the same diskUUID: 3f1df012-ce47-4257-9ec1-9ffe5c2a4feb
. Could you check the solution in #2125 (comment)?
The nodes nixk3s-vm-agent1 and nixk3s-vm-agent2 have the same diskUUID: 3f1df012-ce47-4257-9ec1-9ffe5c2a4feb.
Does it mean we need to make the UUID of each disk unique globally?
I did the process mentioned in the linked comment. I have even went throught the trouble of generating new UUID-s for both the drive used to store longhorn volumes and the boot drive. There doesn't seem to be any noticeable changes in behaviour, the counters are still jumping around.