OVN hangs when trying to create switch
nadenf opened this issue · comments
Describe the bug
Install process is hanging at the following step:
TASK [../../roles/kubeinit_prepare : Remove and create the cluster switch if exists] *****
=> ovn-nbctl --wait=hv ls-add sw0
Which according to documentation indicates that the chassis is malfunctioning.
To Reproduce
This will likely not be reproducible for others.
But have tried restarting, running the cleanup playbook, uninstalling OVN packages and issue still persists.
I have run the ovn-nbctl command and no resources looks to be there e.g. switches, routers etc.
Expected behavior
If the OVN setup is in a bad state somehow it would be useful if the installation script or cleanup process could reset.
Infrastructure
- Hypervisors OS: Ubuntu
- Version: 20.04
Deployment command
podman run --rm -it \
-v ~/.ssh/id_rsa:/root/.ssh/id_rsa:z \
-v /etc/hosts:/etc/hosts \
kubeinit/kubeinit \
--user root \
-v -i ./hosts/eks/inventory \
--become \
--become-user root \
./playbooks/eks.yml
Inventory file diff
30,31c30,31
< disk=20G
< ram=20971520
---
> disk=25G
> ram=25165824
39,40c39,40
< disk=300G
< ram=20971520
---
> disk=30G
> ram=8388608
48,49c48,49
< disk=20G
< ram=20971520
---
> disk=150G
> ram=12582912
64c64
< hypervisor-01 ansible_host=harana
---
> hypervisor-01 ansible_host=nyctea
77,78c77,78
< # eks-controller-02 ansible_host=10.0.0.2 mac=52:54:00:75:99:92 interfaceid=6ec8a8af-1930-4288-b732-937d7ce08d54 target=hypervisor-01 type=virtual
< # eks-controller-03 ansible_host=10.0.0.3 mac=52:54:00:96:68:89 interfaceid=25a13077-4b03-4eba-a52f-8b4048275d0c target=hypervisor-01 type=virtual
---
> eks-controller-02 ansible_host=10.0.0.2 mac=52:54:00:75:99:92 interfaceid=6ec8a8af-1930-4288-b732-937d7ce08d54 target=hypervisor-01 type=virtual
> eks-controller-03 ansible_host=10.0.0.3 mac=52:54:00:96:68:89 interfaceid=25a13077-4b03-4eba-a52f-8b4048275d0c target=hypervisor-01 type=virtual
@nadenf I ran into the same issue a while back.
After uninstall of the openvswitch packages I also deleted the left-overs on the system specifically the openvswitch database (/etc/openvswitch/conf.db
on Fedora/RHEL)
After that my issue was resolved
Worth adding this to a troubleshooting section ?
Or is it better to have a timeout for this step and then print some useful error message.
This looks like something specific to your environment, but we could definetely add /etc/openvswitch/conf.db to the cleanup step for the OVN config.
Fixed by: #380