TASK [k3s_agent : Enable and check K3s service] hanging forever
AshDevFr opened this issue · comments
Expected Behavior
Proceed with the playbook
Current Behavior
When running the playbook, it hangs at the k3s_agent : Enable and check K3s service
task.
I tried a bunch of times to reset
and re-run but without success.
I checked the discussion here but it did not fix the issue I have.
if I use my master node ip address like suggested by FrostyFitz in the discussion it will work but if I put another address for the endpoint it does not work.
It's almost as if the vip address is never created. It does not respond to ping.
I've check all the nodes and I have eth0 everywhere.
Also my token is correct
I tried to use either same network for the virtual ip and ip range (10.193.1.1/24) or a different network (10.193.20.1/24) to have more ips but the result is the same.
Steps to Reproduce
- Clone the project
- Update variables
- Run
Context (variables)
Operating system:
Ubuntu 22.04
Hardware:
Running 5 VM on Proxmox. All of them were created using terraform and a cloud-init template.
Variables Used
all.yml
k3s_version: v1.25.12+k3s1
ansible_user: ubuntu
systemd_dir: /etc/systemd/system
flannel_iface: "eth0"
apiserver_endpoint: "10.193.20.10"
k3s_token: "sKcyohCecVULptzpvatzHrYagPGL4mfN"
extra_args: >-
--flannel-iface={{ flannel_iface }}
--node-ip={{ k3s_node_ip }}
extra_server_args: >-
{{ extra_args }}
{{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
--tls-san {{ apiserver_endpoint }}
--disable servicelb
--disable traefik
extra_agent_args: >-
{{ extra_args }}
kube_vip_tag_version: "v0.5.12"
metal_lb_speaker_tag_version: "v0.13.9"
metal_lb_controller_tag_version: "v0.13.9"
metal_lb_ip_range: "10.193.20.20-10.193.20.99"
Hosts
host.ini
[master]
10.193.1.[155:157]
[node]
10.193.1.[158:159]
[k3s_cluster:children]
master
node
Logs
On the master node
Sep 24 21:34:05 k3s-1 k3s[3394]: E0924 21:34:05.414503 3394 secret.go:192] Couldn't get secret metallb-system/memberlist: secret "memberlist" not found
Sep 24 21:34:05 k3s-1 k3s[3394]: E0924 21:34:05.415375 3394 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist podName:cefc73a3-6380-43f3-8b55-90c9beeedae1 nodeName:}" failed. No retries permitted until 2023-09-24 21:34:37.415339846 -0600 MDT m=+82.612753266 (durationBeforeRetry 32s). Error: MountVolume.SetUp failed for volume "memberlist" (UniqueName: "kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist") pod "speaker-wxc6m" (UID: "cefc73a3-6380-43f3-8b55-90c9beeedae1") : secret "memberlist" not found
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.514639 3394 shared_informer.go:259] Caches are synced for resource quota
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.535098 3394 shared_informer.go:259] Caches are synced for garbage collector
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.563102 3394 shared_informer.go:259] Caches are synced for resource quota
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.585626 3394 shared_informer.go:259] Caches are synced for garbage collector
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.585698 3394 garbagecollector.go:163] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.586814 3394 trace.go:219] Trace[1377249825]: "Proxy via http_connect protocol over tcp" address:10.42.0.4:10250 (24-Sep-2023 21:34:04.982) (total time: 604ms):
Sep 24 21:34:05 k3s-1 k3s[3394]: Trace[1377249825]: [604.629966ms] [604.629966ms] END
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.587493 3394 trace.go:219] Trace[1971106780]: "Proxy via http_connect protocol over tcp" address:10.42.0.4:10250 (24-Sep-2023 21:34:04.981) (total time: 606ms):
Sep 24 21:34:05 k3s-1 k3s[3394]: Trace[1971106780]: [606.059786ms] [606.059786ms] END
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.587659 3394 trace.go:219] Trace[867571666]: "Proxy via http_connect protocol over tcp" address:10.42.0.4:10250 (24-Sep-2023 21:34:04.982) (total time: 604ms):
Sep 24 21:34:05 k3s-1 k3s[3394]: Trace[867571666]: [604.799982ms] [604.799982ms] END
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.587830 3394 trace.go:219] Trace[1441376887]: "Proxy via http_connect protocol over tcp" address:10.42.0.4:10250 (24-Sep-2023 21:34:04.983) (total time: 604ms):
Sep 24 21:34:05 k3s-1 k3s[3394]: Trace[1441376887]: [604.256031ms] [604.256031ms] END
Sep 24 21:34:05 k3s-1 k3s[3394]: I0924 21:34:05.587994 3394 trace.go:219] Trace[33641745]: "Proxy via http_connect protocol over tcp" address:10.42.0.4:10250 (24-Sep-2023 21:34:04.980) (total time: 607ms):
Sep 24 21:34:05 k3s-1 k3s[3394]: Trace[33641745]: [607.771609ms] [607.771609ms] END
Sep 24 21:34:05 k3s-1 k3s[3394]: E0924 21:34:05.613528 3394 available_controller.go:524] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io": the object has been modified; please apply your changes to the latest version and try again
Sep 24 21:34:05 k3s-1 k3s[3394]: {"level":"info","ts":"2023-09-24T21:34:05.971-0600","caller":"traceutil/trace.go:171","msg":"trace[911155418] transaction","detail":"{read_only:false; response_revision:1311; number_of_response:1; }","duration":"106.54913ms","start":"2023-09-24T21:34:05.865-0600","end":"2023-09-24T21:34:05.971-0600","steps":["trace[911155418] 'process raft request' (duration: 106.291865ms)"],"step_count":1}
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.294794 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"24e933ea1791164db9c6855da0d8b66bbb263af9857ee8e53d5806b9eb4fdc98\": not found" containerID="24e933ea1791164db9c6855da0d8b66bbb263af9857ee8e53d5806b9eb4fdc98"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.296444 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="24e933ea1791164db9c6855da0d8b66bbb263af9857ee8e53d5806b9eb4fdc98" err="rpc error: code = NotFound desc = an error occurred when try to find container \"24e933ea1791164db9c6855da0d8b66bbb263af9857ee8e53d5806b9eb4fdc98\": not found"
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.303636 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"447e3faf5b21addde533ac153954515a99aad4540b75a7d9cc37af23ef5ea000\": not found" containerID="447e3faf5b21addde533ac153954515a99aad4540b75a7d9cc37af23ef5ea000"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.303760 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="447e3faf5b21addde533ac153954515a99aad4540b75a7d9cc37af23ef5ea000" err="rpc error: code = NotFound desc = an error occurred when try to find container \"447e3faf5b21addde533ac153954515a99aad4540b75a7d9cc37af23ef5ea000\": not found"
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.307061 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"74a24e625656230a759a2fa931b60272977b19bbfc2e02ca25de0ddb7c2a2abd\": not found" containerID="74a24e625656230a759a2fa931b60272977b19bbfc2e02ca25de0ddb7c2a2abd"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.307169 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="74a24e625656230a759a2fa931b60272977b19bbfc2e02ca25de0ddb7c2a2abd" err="rpc error: code = NotFound desc = an error occurred when try to find container \"74a24e625656230a759a2fa931b60272977b19bbfc2e02ca25de0ddb7c2a2abd\": not found"
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.311465 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"fdf5f59e2dde882b711124b9758e5945c513179afdc73ad1e0cd4de071e13026\": not found" containerID="fdf5f59e2dde882b711124b9758e5945c513179afdc73ad1e0cd4de071e13026"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.311577 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="fdf5f59e2dde882b711124b9758e5945c513179afdc73ad1e0cd4de071e13026" err="rpc error: code = NotFound desc = an error occurred when try to find container \"fdf5f59e2dde882b711124b9758e5945c513179afdc73ad1e0cd4de071e13026\": not found"
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.313676 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"81af9130a4196030a89317c7419735a7e79006d886f95e99a42c393f6c797841\": not found" containerID="81af9130a4196030a89317c7419735a7e79006d886f95e99a42c393f6c797841"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.313787 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="81af9130a4196030a89317c7419735a7e79006d886f95e99a42c393f6c797841" err="rpc error: code = NotFound desc = an error occurred when try to find container \"81af9130a4196030a89317c7419735a7e79006d886f95e99a42c393f6c797841\": not found"
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.318267 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"ab0ff68f27bd01fe90e127e3b6760289c9009a70289e131955eebd0ebc47129c\": not found" containerID="ab0ff68f27bd01fe90e127e3b6760289c9009a70289e131955eebd0ebc47129c"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.318388 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="ab0ff68f27bd01fe90e127e3b6760289c9009a70289e131955eebd0ebc47129c" err="rpc error: code = NotFound desc = an error occurred when try to find container \"ab0ff68f27bd01fe90e127e3b6760289c9009a70289e131955eebd0ebc47129c\": not found"
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.319482 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"99864d63eaee7a339c829ef337f2533d87fa477acadd5c04d6087036713ae140\": not found" containerID="99864d63eaee7a339c829ef337f2533d87fa477acadd5c04d6087036713ae140"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.319596 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="99864d63eaee7a339c829ef337f2533d87fa477acadd5c04d6087036713ae140" err="rpc error: code = NotFound desc = an error occurred when try to find container \"99864d63eaee7a339c829ef337f2533d87fa477acadd5c04d6087036713ae140\": not found"
Sep 24 21:34:32 k3s-1 k3s[3394]: E0924 21:34:32.320791 3394 remote_runtime.go:625] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"af2148977c522f9c5276333e5935fb009284c7af616bc9fefd88d882e033d5be\": not found" containerID="af2148977c522f9c5276333e5935fb009284c7af616bc9fefd88d882e033d5be"
Sep 24 21:34:32 k3s-1 k3s[3394]: I0924 21:34:32.320901 3394 kuberuntime_gc.go:361] "Error getting ContainerStatus for containerID" containerID="af2148977c522f9c5276333e5935fb009284c7af616bc9fefd88d882e033d5be" err="rpc error: code = NotFound desc = an error occurred when try to find container \"af2148977c522f9c5276333e5935fb009284c7af616bc9fefd88d882e033d5be\": not found"
Sep 24 21:34:37 k3s-1 k3s[3394]: E0924 21:34:37.447729 3394 secret.go:192] Couldn't get secret metallb-system/memberlist: secret "memberlist" not found
Sep 24 21:34:37 k3s-1 k3s[3394]: E0924 21:34:37.447967 3394 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist podName:cefc73a3-6380-43f3-8b55-90c9beeedae1 nodeName:}" failed. No retries permitted until 2023-09-24 21:35:41.447927032 -0600 MDT m=+146.645340444 (durationBeforeRetry 1m4s). Error: MountVolume.SetUp failed for volume "memberlist" (UniqueName: "kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist") pod "speaker-wxc6m" (UID: "cefc73a3-6380-43f3-8b55-90c9beeedae1") : secret "memberlist" not found
Sep 24 21:35:01 k3s-1 k3s[3394]: E0924 21:35:01.038365 3394 dns.go:157] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.193.1.35 10.193.1.12 10.193.1.35"
Sep 24 21:35:36 k3s-1 k3s[3394]: E0924 21:35:36.572367 3394 kubelet.go:1731] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[memberlist], unattached volumes=[memberlist kube-api-access-tpv7s]: timed out waiting for the condition" pod="metallb-system/speaker-wxc6m"
Sep 24 21:35:36 k3s-1 k3s[3394]: E0924 21:35:36.572508 3394 pod_workers.go:965] "Error syncing pod, skipping" err="unmounted volumes=[memberlist], unattached volumes=[memberlist kube-api-access-tpv7s]: timed out waiting for the condition" pod="metallb-system/speaker-wxc6m" podUID=cefc73a3-6380-43f3-8b55-90c9beeedae1
Sep 24 21:35:41 k3s-1 k3s[3394]: E0924 21:35:41.544516 3394 secret.go:192] Couldn't get secret metallb-system/memberlist: secret "memberlist" not found
Sep 24 21:35:41 k3s-1 k3s[3394]: E0924 21:35:41.546423 3394 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist podName:cefc73a3-6380-43f3-8b55-90c9beeedae1 nodeName:}" failed. No retries permitted until 2023-09-24 21:37:43.546358301 -0600 MDT m=+268.743771731 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "memberlist" (UniqueName: "kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist") pod "speaker-wxc6m" (UID: "cefc73a3-6380-43f3-8b55-90c9beeedae1") : secret "memberlist" not found
Sep 24 21:36:11 k3s-1 k3s[3394]: E0924 21:36:11.038858 3394 dns.go:157] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.193.1.35 10.193.1.12 10.193.1.35"
Sep 24 21:37:43 k3s-1 k3s[3394]: E0924 21:37:43.577589 3394 secret.go:192] Couldn't get secret metallb-system/memberlist: secret "memberlist" not found
Sep 24 21:37:43 k3s-1 k3s[3394]: E0924 21:37:43.577922 3394 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist podName:cefc73a3-6380-43f3-8b55-90c9beeedae1 nodeName:}" failed. No retries permitted until 2023-09-24 21:39:45.577823361 -0600 MDT m=+390.775236858 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "memberlist" (UniqueName: "kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist") pod "speaker-wxc6m" (UID: "cefc73a3-6380-43f3-8b55-90c9beeedae1") : secret "memberlist" not found
Sep 24 21:37:54 k3s-1 k3s[3394]: E0924 21:37:54.040671 3394 kubelet.go:1731] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[memberlist], unattached volumes=[memberlist kube-api-access-tpv7s]: timed out waiting for the condition" pod="metallb-system/speaker-wxc6m"
Sep 24 21:37:54 k3s-1 k3s[3394]: E0924 21:37:54.044689 3394 pod_workers.go:965] "Error syncing pod, skipping" err="unmounted volumes=[memberlist], unattached volumes=[memberlist kube-api-access-tpv7s]: timed out waiting for the condition" pod="metallb-system/speaker-wxc6m" podUID=cefc73a3-6380-43f3-8b55-90c9beeedae1
Sep 24 21:38:22 k3s-1 k3s[3394]: E0924 21:38:22.044813 3394 dns.go:157] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.193.1.35 10.193.1.12 10.193.1.35"
Sep 24 21:39:45 k3s-1 k3s[3394]: E0924 21:39:45.610639 3394 secret.go:192] Couldn't get secret metallb-system/memberlist: secret "memberlist" not found
Sep 24 21:39:45 k3s-1 k3s[3394]: E0924 21:39:45.610944 3394 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist podName:cefc73a3-6380-43f3-8b55-90c9beeedae1 nodeName:}" failed. No retries permitted until 2023-09-24 21:41:47.610881377 -0600 MDT m=+512.808294793 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "memberlist" (UniqueName: "kubernetes.io/secret/cefc73a3-6380-43f3-8b55-90c9beeedae1-memberlist") pod "speaker-wxc6m" (UID: "cefc73a3-6380-43f3-8b55-90c9beeedae1") : secret "memberlist" not found
Sep 24 21:40:11 k3s-1 k3s[3394]: E0924 21:40:11.043026 3394 kubelet.go:1731] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[memberlist], unattached volumes=[kube-api-access-tpv7s memberlist]: timed out waiting for the condition" pod="metallb-system/speaker-wxc6m"
Sep 24 21:40:11 k3s-1 k3s[3394]: E0924 21:40:11.044014 3394 pod_workers.go:965] "Error syncing pod, skipping" err="unmounted volumes=[memberlist], unattached volumes=[kube-api-access-tpv7s memberlist]: timed out waiting for the condition" pod="metallb-system/speaker-wxc6m" podUID=cefc73a3-6380-43f3-8b55-90c9beeedae1
On the worker node
Sep 24 21:08:12 k3s-4 k3s[1483]: time="2023-09-24T21:08:12-06:00" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
Sep 24 21:08:12 k3s-4 k3s[1483]: time="2023-09-24T21:08:12-06:00" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/3cdacaf539fc388d8e542a8d643948e3c7bfa4a7e91b7521102325e0ce8581b6"
Sep 24 21:08:17 k3s-4 k3s[1483]: time="2023-09-24T21:08:17-06:00" level=info msg="Starting k3s agent v1.25.12+k3s1 (7515237f)"
Sep 24 21:08:17 k3s-4 k3s[1483]: time="2023-09-24T21:08:17-06:00" level=info msg="Adding server to load balancer k3s-agent-load-balancer: 10.193.20.10:6443"
Sep 24 21:08:17 k3s-4 k3s[1483]: time="2023-09-24T21:08:17-06:00" level=info msg="Running load balancer k3s-agent-load-balancer 127.0.0.1:6444 -> [10.193.20.10:6443] [default: 10.193.20.10:6443]"
Sep 24 21:08:23 k3s-4 k3s[1483]: time="2023-09-24T21:08:23-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:59380->127.0.0.1:6444: read: connection reset by peer"
Sep 24 21:08:31 k3s-4 k3s[1483]: time="2023-09-24T21:08:31-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:58798->127.0.0.1:6444: read: connection reset by peer"
Sep 24 21:08:39 k3s-4 k3s[1483]: time="2023-09-24T21:08:39-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:50748->127.0.0.1:6444: read: connection reset by peer"
Sep 24 21:08:48 k3s-4 k3s[1483]: time="2023-09-24T21:08:48-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:50776->127.0.0.1:6444: read: connection reset by peer"
Sep 24 21:08:56 k3s-4 k3s[1483]: time="2023-09-24T21:08:56-06:00" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:50578->127.0.0.1:6444: read: connection reset by peer"
Possible Solution
- I've checked the General Troubleshooting Guide