longhorn / longhorn

Cloud-Native distributed storage built on and for Kubernetes

Home Page:https://longhorn.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Assigned Replicas on VM Nodes keep flickering around

gogo199432 opened this issue · comments

Describe the bug

The assigned volumes are constantly flickering between two of my nodes for no apparent reason. All nodes are NixOS boxes, the only difference is that those that are static are actual physical PCs, while the other two are VMs under Proxmox.

This issue doesn't seem to cause any operational issue for the pods themselves, but it is still a bit worrying.

2024-06-13.16-57-06.mp4

To Reproduce

Not sure about reproduction, this is just a standard install through kubectl apply as described in the docs

Expected behavior

Assigned volumes shouldn't be changing constantly

Support bundle for troubleshooting

supportbundle_76ddf2d1-9b00-4150-8989-cf4f91d61dce_2024-06-13T15-14-38Z.zip

Environment

  • Longhorn version: 1.6.2
  • Impacted volume (PV): Example: pvc-9446eddc-d0b0-459a-967a-b67e569b1e21 . However many PVs are impacted
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s v1.30.0+k3s1
    • Number of control plane nodes in the cluster: 3
    • Number of worker nodes in the cluster: 4
  • Node config
    • OS type and version: NixOS 24.05
    • Kernel version: 6.6.32
    • CPU per node: 4
    • Memory per node: 4GB
    • Disk type (e.g. SSD/NVMe/HDD): SSD
    • Network bandwidth between the nodes (Gbps): 1
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM and Baremetal (depending on node)
  • Number of Longhorn volumes in the cluster: 17

The nodes nixk3s-vm-agent1 and nixk3s-vm-agent2 have the same diskUUID: 3f1df012-ce47-4257-9ec1-9ffe5c2a4feb. Could you check the solution in #2125 (comment)?

The nodes nixk3s-vm-agent1 and nixk3s-vm-agent2 have the same diskUUID: 3f1df012-ce47-4257-9ec1-9ffe5c2a4feb.

Does it mean we need to make the UUID of each disk unique globally?

I did the process mentioned in the linked comment. I have even went throught the trouble of generating new UUID-s for both the drive used to store longhorn volumes and the boot drive. There doesn't seem to be any noticeable changes in behaviour, the counters are still jumping around.