Kubeinit / kubeinit

Ansible automation to have a KUBErnetes cluster INITialized as soon as possible...

Home Page:https://www.kubeinit.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OVN hangs when trying to create switch

nadenf opened this issue · comments

Describe the bug
Install process is hanging at the following step:

TASK [../../roles/kubeinit_prepare : Remove and create the cluster switch if exists] *****
=> ovn-nbctl --wait=hv ls-add sw0

Which according to documentation indicates that the chassis is malfunctioning.

To Reproduce
This will likely not be reproducible for others.
But have tried restarting, running the cleanup playbook, uninstalling OVN packages and issue still persists.
I have run the ovn-nbctl command and no resources looks to be there e.g. switches, routers etc.

Expected behavior
If the OVN setup is in a bad state somehow it would be useful if the installation script or cleanup process could reset.

Infrastructure

  • Hypervisors OS: Ubuntu
  • Version: 20.04

Deployment command

podman run --rm -it \
    -v ~/.ssh/id_rsa:/root/.ssh/id_rsa:z \
    -v /etc/hosts:/etc/hosts \
    kubeinit/kubeinit \
        --user root \
        -v -i ./hosts/eks/inventory \
        --become \
        --become-user root \
        ./playbooks/eks.yml

Inventory file diff

30,31c30,31
< disk=20G
< ram=20971520
---
> disk=25G
> ram=25165824
39,40c39,40
< disk=300G
< ram=20971520
---
> disk=30G
> ram=8388608
48,49c48,49
< disk=20G
< ram=20971520
---
> disk=150G
> ram=12582912
64c64
< hypervisor-01 ansible_host=harana
---
> hypervisor-01 ansible_host=nyctea
77,78c77,78
< # eks-controller-02 ansible_host=10.0.0.2 mac=52:54:00:75:99:92 interfaceid=6ec8a8af-1930-4288-b732-937d7ce08d54 target=hypervisor-01 type=virtual
< # eks-controller-03 ansible_host=10.0.0.3 mac=52:54:00:96:68:89 interfaceid=25a13077-4b03-4eba-a52f-8b4048275d0c target=hypervisor-01 type=virtual
---
> eks-controller-02 ansible_host=10.0.0.2 mac=52:54:00:75:99:92 interfaceid=6ec8a8af-1930-4288-b732-937d7ce08d54 target=hypervisor-01 type=virtual
> eks-controller-03 ansible_host=10.0.0.3 mac=52:54:00:96:68:89 interfaceid=25a13077-4b03-4eba-a52f-8b4048275d0c target=hypervisor-01 type=virtual

@nadenf I ran into the same issue a while back.
After uninstall of the openvswitch packages I also deleted the left-overs on the system specifically the openvswitch database (/etc/openvswitch/conf.db on Fedora/RHEL)
After that my issue was resolved

Worth adding this to a troubleshooting section ?

Or is it better to have a timeout for this step and then print some useful error message.

This looks like something specific to your environment, but we could definetely add /etc/openvswitch/conf.db to the cleanup step for the OVN config.

@ccamacho .. Thanks really appreciate it.