OVN hangs when trying to create switch

Question

OVN hangs when trying to create switch

nadenf opened this issue 3 years ago · comments

Describe the bug
Install process is hanging at the following step:

TASK [../../roles/kubeinit_prepare : Remove and create the cluster switch if exists] *****
=> ovn-nbctl --wait=hv ls-add sw0

Which according to documentation indicates that the chassis is malfunctioning.

To Reproduce
This will likely not be reproducible for others.
But have tried restarting, running the cleanup playbook, uninstalling OVN packages and issue still persists.
I have run the ovn-nbctl command and no resources looks to be there e.g. switches, routers etc.

Expected behavior
If the OVN setup is in a bad state somehow it would be useful if the installation script or cleanup process could reset.

Infrastructure

Hypervisors OS: Ubuntu
Version: 20.04

Deployment command

podman run --rm -it \
    -v ~/.ssh/id_rsa:/root/.ssh/id_rsa:z \
    -v /etc/hosts:/etc/hosts \
    kubeinit/kubeinit \
        --user root \
        -v -i ./hosts/eks/inventory \
        --become \
        --become-user root \
        ./playbooks/eks.yml

Inventory file diff

30,31c30,31
< disk=20G
< ram=20971520
---
> disk=25G
> ram=25165824
39,40c39,40
< disk=300G
< ram=20971520
---
> disk=30G
> ram=8388608
48,49c48,49
< disk=20G
< ram=20971520
---
> disk=150G
> ram=12582912
64c64
< hypervisor-01 ansible_host=harana
---
> hypervisor-01 ansible_host=nyctea
77,78c77,78
< # eks-controller-02 ansible_host=10.0.0.2 mac=52:54:00:75:99:92 interfaceid=6ec8a8af-1930-4288-b732-937d7ce08d54 target=hypervisor-01 type=virtual
< # eks-controller-03 ansible_host=10.0.0.3 mac=52:54:00:96:68:89 interfaceid=25a13077-4b03-4eba-a52f-8b4048275d0c target=hypervisor-01 type=virtual
---
> eks-controller-02 ansible_host=10.0.0.2 mac=52:54:00:75:99:92 interfaceid=6ec8a8af-1930-4288-b732-937d7ce08d54 target=hypervisor-01 type=virtual
> eks-controller-03 ansible_host=10.0.0.3 mac=52:54:00:96:68:89 interfaceid=25a13077-4b03-4eba-a52f-8b4048275d0c target=hypervisor-01 type=virtual

R. Mathieu Landvreugd · Answer 1 · Mon Jun 28 2021 16:13:32 GMT+0800 (China Standard Time)

@nadenf I ran into the same issue a while back.
After uninstall of the openvswitch packages I also deleted the left-overs on the system specifically the openvswitch database (/etc/openvswitch/conf.db on Fedora/RHEL)
After that my issue was resolved

harana · Answer 2 · Mon Jun 28 2021 16:23:40 GMT+0800 (China Standard Time)

Worth adding this to a troubleshooting section ?

Or is it better to have a timeout for this step and then print some useful error message.

Carlos Camacho Gonzalez · Answer 3 · Mon Jun 28 2021 23:41:32 GMT+0800 (China Standard Time)

This looks like something specific to your environment, but we could definetely add /etc/openvswitch/conf.db to the cleanup step for the OVN config.

Carlos Camacho Gonzalez · Answer 4 · Mon Jun 28 2021 23:59:45 GMT+0800 (China Standard Time)

Fixed by: #380

harana · Answer 5 · Tue Jun 29 2021 04:39:15 GMT+0800 (China Standard Time)

@ccamacho .. Thanks really appreciate it.