oasis-roles / ansible_collection_system

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

reboot: Adding retry to the reboot role

shwetha-h-p opened this issue · comments

Some times the reboot of the node fails for the following reason.

This is very intermittent.

TASK [reboot : Rebooting system] **********************************************************************************************************************************************************************************
fatal: [10.0.100.91]: FAILED! => {"msg": "Failed to determine system distribution. /bin/sh: powershell: command not found, Shared connection to 10.0.100.91 closed."}
fatal: [10.0.103.80]: FAILED! => {"msg": "Failed to determine system distribution. /bin/sh: powershell: command not found, Shared connection to 10.0.103.80 closed."}
fatal: [10.0.102.243]: FAILED! => {"msg": "Failed to determine system distribution. /bin/sh: powershell: command not found, Shared connection to 10.0.102.243 closed."}
fatal: [10.0.103.245]: FAILED! => {"msg": "Failed to determine system distribution. /bin/sh: powershell: command not found, Shared connection to 10.0.103.245 closed."}
fatal: [10.0.102.9]: FAILED! => {"msg": "Failed to determine system distribution. /bin/sh: powershell: command not found, Shared connection to 10.0.102.9 closed."}

What distribution version are you running in this case? Please provide a playbook snippet for your usage of this role, including any group vars which influence the role's behavior.

Case1:
#8.2. Registering the Operating System for Nodes

  • name: Registering and updating overcloud
    hosts: osp_overcloud
    roles:
    • role: oasis_roles.system.rhsm
    • role: oasis_roles.system.package_updater
    • role: oasis_roles.system.reboot

I am using the reboot role after registering the systems and yum updating the systems.

Case2:

  • name: Update /etc/resolv.conf file
    hosts: osp_undercloud:osp_overcloud

    roles:

    • role: oasis_roles.system.reboot
    • role: oasis_roles.system.dns_resolv
      dns_resolv_servers: "{{ nameservers }}"

After provisioning the nodes on the openstack tenant, we reboot the nodes and add nameservers to those.

The group_vars mostly contain rhsm related vars i.e rhsm_repositories, rhsm_pool_ids etc.

we are provisioning all vm's with OS_IMAGE="RHEL-7.7-Server-x86_64-production-latest"

Error message:

fatal: []: FAILED! => {"msg": "Failed to determine system distribution. gl=38;5;13:.dl=38;5;13:.xcf=38;5;13:.xwd=38;5;13:.yuv=38;5;13:.cgm=38;5;13:.emf=38;5;13:.axv=38;5;13:.anx=38;5;13:.ogv=38;5;13:.ogx=38;5;13:.aac=38;5;45:.au=38;5;45:.flac=38;5;45:.mid=38;5;45:.midi=38;5;45:.mka=38;5;45:.mp3=38;5;45:.mpc=38;5;45:.ogg=38;5;45:.ra=38;5;45:.wav=38;5;45:.axa=38;5;45:.oga=38;5;45:.spx=38;5;45:*.xspf=38;5;45:", "HOME": "/root", "_": "/usr/bin/python"}, "ansible_distribution_major_version": "7", "module_setup": true, "ansible_hostname": "psi-c0-dev-osp13-ctrl-msg-3", "ansible_real_group_id": 0, "ansible_lsb": {}, "ansible_proc_cmdline": {"no_timer_check": true, "LANG": "en_US.UTF-8", "console": ["tty0", "ttyS0,115200n8"], "net.ifnames": "0", "crashkernel": "auto", "BOOT_IMAGE": "/boot/vmlinuz-3.10.0-1127.13.1.el7.x86_64", "ro": true, "root": "UUID=3425ddda-aed9-4a29-886c-3bd6b776b539"}, "ansible_local": {}, "ansible_machine": "x86_64", "ansible_ssh_host_key_rsa_public": "AAAAB3NzaC1yc2EAAAADAQABAAABAQCushvVQglIKOPnrVvaoCUxJVKsL3EsPWilJufdpP4rwqkWn4Tx8WjbvJcaavHospJdJVM2mKVb/2MS7h7EFd5Rufg7S07iP8Kizee8T2vVO+zTMs/HPysrrbcizPXFAY8Skl3A+1mw/6FrM1EQbzrIoirl+z+E7jmo6KgUbbpbciLwzFSgROqGYvqwIHVDwe0CHJdMJlPl7sfAeJh8HsMCtDczJSCMDoV++l3WoxexudaJ+K4JiTW328SnLqwwMQh5VkrjqA8yX+OAfDsG6dU3SbhTx4rb84y7uJALCjtZTCW/zkk2zeXhH+55g55jtXuXirZSyFI5tZpBZTbQd1M1", "ansible_system_capabilities_enforced": "True", "ansible_user_gecos": "root", "ansible_system_capabilities": ["cap_chown", "cap_dac_override", "cap_dac_read_search", "cap_fowner", "cap_fsetid", "cap_kill", "cap_setgid", "cap_setuid", "cap_setpcap", "cap_linux_immutable", "cap_net_bind_service", "cap_net_broadcast", "cap_net_admin", "cap_net_raw", "cap_ipc_lock", "cap_ipc_owner", "cap_sys_module", "cap_sys_rawio", "cap_sys_chroot", "cap_sys_ptrace", "cap_sys_pacct", "cap_sys_admin", "cap_sys_boot", "cap_sys_nice", "cap_sys_resource", "cap_sys_time", "cap_sys_tty_config", "cap_mknod", "cap_lease", "cap_audit_write", "cap_audit_control", "cap_setfcap", "cap_mac_override", "cap_mac_admin", "cap_syslog", "35", "36+ep"], "ansible_python": {"executable": "/usr/bin/python", "version": {"micro": 5, "major": 2, "releaselevel": "final", "serial": 0, "minor": 7}, "type": "CPython", "has_sslcontext": true, "version_info": [2, 7, 5, "final", 0]}, "ansible_selinux": {"status": "enabled", "policyvers": 31, "type": "targeted", "mode": "enforcing", "config_mode": "enforcing"}, "ansible_fqdn": "psi-c0-dev-osp13-ctrl-msg-3", "ansible_user_gid": 0, "ansible_python_version": "2.7.5", "ansible_system": "Linux", "ansible_user_shell": "/bin/bash", "ansible_kernel": "3.10.0-1127.13.1.el7.x86_64", "ansible_nodename": "psi-c0-dev-osp13-ctrl-msg-3"}}, Shared connection to closed."}

$ which reboot
/usr/sbin/reboot
[stack@psi-c0-dev-osp13-ctrl-msg-3 ~]$ which shutdown
/usr/sbin/shutdown
[stack@psi-c0-dev-osp13-ctrl-msg-3 ~]$