Kubeinit / kubeinit

Ansible automation to have a KUBErnetes cluster INITialized as soon as possible...

Home Page:https://www.kubeinit.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

openvswitch and routes not working after restart on Rocky-Linux 8.5

Tokix opened this issue · comments

commented

Description
With a bit of modification of the kubeinit files I am able to get okd deployed on rocky-linux 8.5. You can see the modifications here: https://github.com/Tokix/kubeinit I could make a pull request but there is one thing that is not working as expected and that is the restart of the server. After the restart the routes are vanished and I'm not able to reach the frontend anymore.

To Reproduce
Steps to reproduce the behavior:

  1. Install a Redhat 8.5 machine setup ssh connection as nyctea as described in the manual
  2. In my case I had to install python on the hypervisor_host machine addionally before running the playbook successfully

yum install python3

  1. Clone the changes for Rocky8.5

git clone https://github.com/Tokix/kubeinit.git

  1. Run the playbook
ansible-playbook \
    -v --user root \
    -e kubeinit_spec=okd-libvirt-3-1-1 \
    -i ./kubeinit/inventory \
    ./kubeinit/playbook.yml

  1. Enable the frontend
ssh root@nyctea
chmod +x  create-external-ingress.sh
./create-external-ingress.sh
  1. Setup the DNS Entries for your system
  2. check if the url is working (it works at this point):

https://console-openshift-console.apps.okdcluster.kubeinit.local/

  1. reboot the server

init 6

  1. The URL is not working any longer:

https://console-openshift-console.apps.okdcluster.kubeinit.local/

Expected behavior
The external url of the cluster should be available on restart and the routes should be set.

Screenshots
Working route-configuration before the restart:

image

Route configuration after restart:

image

Infrastructure

  • Hypervisors OS: Rocky-Linux
  • Version 8.5

Deployment command

ansible-playbook \
    -v --user root \
    -e kubeinit_spec=okd-libvirt-3-1-1 \
    -i ./kubeinit/inventory \
    ./kubeinit/playbook.yml

Inventory file diff

I did no changes to the inventory file

Additional context

As selinux is active on rocky-linux 8.5 my first thought was that some changes could not be persisted so I disabled selinux for testing. However it is still not running after restart.

Checked this old issue https://forums.opensuse.org/showthread.php/530879-openvswitch-loses-configuration-on-reboot but it seems that the booting order of openvswitch and network.service is fine.

Furthermore I ran the steps "Attach our cluster network to the logical router" in the file kubeinit/roles/kubeinit_libvirt/tasks/create_network.yml - This got me back to the correct routing table but I'm still not able to reach the guest-systems via 10.0.0.1-x

Is there any script or service that needs or can be re-run to enable the networking after reboot?
In any case I'm thankful for any hints let me know if you need more information.

Thank you in any case for the great project :)

I'm also running into a problem with Rocky Linux.

Any help is welcome, this is a cool project, I hope we can get it working on Rocky.

TASK [kubeinit.kubeinit.kubeinit_prepare : Create ssh config file from template] *******************************************************************************
task path: /home/jeff/.ansible/collections/ansible_collections/kubeinit/kubeinit/roles/kubeinit_prepare/tasks/create_host_ssh_config.yml:52
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: jeff
<127.0.0.1> EXEC /bin/sh -c 'echo ~jeff && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/jeff/.ansible/tmp `"&& mkdir "` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" && echo ansible-tmp-1650765476.136161-198434-213715344205742="` echo /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742 `" ) && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /home/jeff/.ansible/tmp/ansible-tmp-1650765476.136161-198434-213715344205742/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1117, in do_template
    res = j2_concat(rf)
  File "<template>", line 47, in root
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/jinja2/runtime.py", line 903, in _fail_with_undefined_error
    raise self._undefined_exception(self._undefined_message)
jinja2.exceptions.UndefinedError: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/plugins/action/template.py", line 146, in run
    resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False)
  File "/home/jeff/kubeinit/kubeinit/lib64/python3.6/site-packages/ansible/template/__init__.py", line 1154, in do_template
    raise AnsibleUndefinedVariable(e)
ansible.errors.AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'
fatal: [localhost]: FAILED! => {
    "changed": false,
    "msg": "AnsibleUndefinedVariable: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'ansible_host'"
}

PLAY RECAP *****************************************************************************************************************************************************
localhost                  : ok=48   changed=7    unreachable=0    failed=1    skipped=25   rescued=0    ignored=0   

My issue isn't specific to Rocky, so I'll add a new issue.

I ran into the same error using Debian.

Edit (Issue added): #647

Maybe there are some IPtables rules not persisted after rebooting and I dont have a way to test this on Rocky.

commented

Hi @ccamacho,

Thanks for the awesome project. 👍

I am also running into same issue. After reboot, I am not able to reach 10.0.0.x.
Is there a way where we can re enable the networking after reboot?

commented

I've got two servers, one with alma 8.x (which also seems to lose connectivity after reboot), and one with centos stream. I could help with providing some debug data, I can sacrifice my currently running clusters if need be.

commented

Okay, so the one with CentOS 8 and vanilla k8s didn't persist after restart. The VM's launched fine, but there was no networking. Also, the service pod only had one IP address, from the 10.89.x.x subnet.