Proxmox Cloud-init deploy, failing at "Copy vip manifest to first master" task.
untraceablez opened this issue · comments
The issue is when running ansible-playbook site.yml
The playbook runs up until the "copy vip manifest to first master" task, at which point it fails.
Expected Behavior
Playbook should run all the way through, setup an HA cluster running with 3 control nodes and 7 worker nodes.
Current Behavior
Ansible playbook will run just fine until the task "copy vip manifest to first master", then stop, showing a failure for all 3 master nodes, with all 7 worker nodes going through the playbook just fine.
Steps to Reproduce
- Fork the repo
- Changed inventory to match my local lan IPs for VMs
- Changed variables in all.yml to match local environment (including IPs since I'm on a 10.0.0./24 network)
- Adjusted ansible.cfg to inventory location, added private ssh-key.
- Removed raspberry-pi role from playbook
Context (variables)
Operating system:
Hypervisor: Proxmox VE 8
Ansible Controller OS: Ubuntu 22.04
Node OS: Ubuntu 22.04 based off jammy cloud init image.
Hardware:
Intel i9 13900
RAM 128 GB DDR5 5200MHz
MSI Pro Series Z790 Motherboard
4 NVMe RAID-10 Array (4 x 1TB)
2 HDD MIRROR Array (2 x 6TB)
Variables Used
all.yml
k3s_version: "v1.25.12+k3s1"
ansible_user: ansible
systemd_dir: "/etc/systemd/system"
flannel_iface: "eth0"
apiserver_endpoint: "10.0.0.222"
k3s_token: "NA"
extra_server_args:
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
--disable traefik
extra_agent_args:
--flannel-iface={{ flannel_iface }}
--node-ip={{ k3s_node_ip }}
kube_vip_tag_version: "v0.5.12"
metal_lb_speaker_tag_version: "v0.13.9"
metal_lb_controller_tag_version: "v0.13.9"
metal_lb_ip_range: "10.0.0.80-10.0.0.90"
Hosts
host.ini
[k3s_cluster:children]
master
node
[master]
node01 ansible_host=10.0.0.178
node02 ansible_host=10.0.0.225
node03 ansible_host=10.0.0.47
[node]
node04 ansible_host=10.0.0.251
node05 ansible_host=10.0.0.142
node06 ansible_host=10.0.0.237
node07 ansible_host=10.0.0.137
node08 ansible_host=10.0.0.231
node09 ansible_host=10.0.0.118
node10 ansible_host=10.0.0.67
Possible Solution
- I've checked the General Troubleshooting Guide
I actually resolved this by just remaking the nodes from cloud-init and changing from node01, node02 etc naming scheme to control-node01... and worker-node01... Not sure why that made a difference, but it did! I suspect there's likely something else I did right this time that I just thought I'd done correctly previously.