[BUG] Assigned Replicas on VM Nodes keep flickering around

Question

[BUG] Assigned Replicas on VM Nodes keep flickering around

gogo199432 opened this issue 2 months ago · comments

gogo199432 commented 2 months ago

Describe the bug

The assigned volumes are constantly flickering between two of my nodes for no apparent reason. All nodes are NixOS boxes, the only difference is that those that are static are actual physical PCs, while the other two are VMs under Proxmox.

This issue doesn't seem to cause any operational issue for the pods themselves, but it is still a bit worrying.

2024-06-13.16-57-06.mp4

To Reproduce

Not sure about reproduction, this is just a standard install through kubectl apply as described in the docs

Expected behavior

Assigned volumes shouldn't be changing constantly

Support bundle for troubleshooting

supportbundle_76ddf2d1-9b00-4150-8989-cf4f91d61dce_2024-06-13T15-14-38Z.zip

Environment

Longhorn version: 1.6.2
Impacted volume (PV): Example: pvc-9446eddc-d0b0-459a-967a-b67e569b1e21 . However many PVs are impacted
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s v1.30.0+k3s1
- Number of control plane nodes in the cluster: 3
- Number of worker nodes in the cluster: 4
Node config
- OS type and version: NixOS 24.05
- Kernel version: 6.6.32
- CPU per node: 4
- Memory per node: 4GB
- Disk type (e.g. SSD/NVMe/HDD): SSD
- Network bandwidth between the nodes (Gbps): 1
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM and Baremetal (depending on node)
Number of Longhorn volumes in the cluster: 17

c3y1huang · Answer 1 · Fri Jun 14 2024 11:52:20 GMT+0800 (China Standard Time)

The nodes nixk3s-vm-agent1 and nixk3s-vm-agent2 have the same diskUUID: 3f1df012-ce47-4257-9ec1-9ffe5c2a4feb. Could you check the solution in #2125 (comment)?

Derek Su · Answer 2 · Fri Jun 14 2024 12:43:32 GMT+0800 (China Standard Time)

The nodes nixk3s-vm-agent1 and nixk3s-vm-agent2 have the same diskUUID: 3f1df012-ce47-4257-9ec1-9ffe5c2a4feb.

Does it mean we need to make the UUID of each disk unique globally?

gogo199432 · Answer 3 · Fri Jun 14 2024 23:14:13 GMT+0800 (China Standard Time)

I did the process mentioned in the linked comment. I have even went throught the trouble of generating new UUID-s for both the drive used to store longhorn volumes and the boot drive. There doesn't seem to be any noticeable changes in behaviour, the counters are still jumping around.