Issues with cluster on Hetzner cloud - Pods stuck in "creating container"
vistalba opened this issue · comments
Hi together
First: Thank you for this nice hobby-kube!!! :-)
I build one on 3 hetzner cloud vm's today. (two times, first on ubuntu 18.04 then on ubuntu 16.04)
I used this guide https://github.com/hobby-kube/guide to build it manually.
If I deploy something it hangs on "ContainerCreating" and after some time I can see the error message:
Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/76e1d121d2aedd44c3652fd285428241770a7ae2c46dc26bab853c05a025c84b: dial tcp 127.0.0.1:6784: connect: connection refused
Maybe I did something wrong or is there something missing in the guide?
Any help is much appreciated.
Some output... hope it helps:
root@kube01 ~/deployments # kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube01 Ready master 33m v1.10.2
kube02 Ready <none> 29m v1.10.2
kube03 Ready <none> 29m v1.10.2
root@kube01 ~/deployments # kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-apiserver-kube01 1/1 Running 0 32m
kube-system kube-controller-manager-kube01 1/1 Running 0 33m
kube-system kube-dns-86f4d74b45-r9sbg 3/3 Running 0 33m
kube-system kube-proxy-4cg67 1/1 Running 0 30m
kube-system kube-proxy-m7nmc 1/1 Running 0 33m
kube-system kube-proxy-xc729 1/1 Running 0 30m
kube-system kube-scheduler-kube01 1/1 Running 0 33m
kube-system kubernetes-dashboard-7f87cb5646-6qfp7 0/1 ContainerCreating 0 26m
kube-system weave-net-kkbvj 2/2 Running 0 6m
kube-system weave-net-p2q5s 2/2 Running 0 6m
kube-system weave-net-sw7tz 2/2 Running 0 6m
root@kube01 ~/deployments # kubectl describe pod -n kube-system kubernetes-dashboard-7f87cb5646-6qfp7
Name: kubernetes-dashboard-7f87cb5646-6qfp7
Namespace: kube-system
Node: kube03/88.198.93.160
Start Time: Tue, 08 May 2018 22:17:29 +0200
Labels: app=kubernetes-dashboard
pod-template-hash=3943761202
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/kubernetes-dashboard-7f87cb5646
Containers:
kubernetes-dashboard:
Container ID:
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.8.3
Image ID:
Port: 9090/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-lvlj7 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-lvlj7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-lvlj7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 27m kubelet, kube03 MountVolume.SetUp succeeded for volume "default-token-lvlj7"
Normal Scheduled 27m default-scheduler Successfully assigned kubernetes-dashboard-7f87cb5646-6qfp7 to kube03
Warning FailedCreatePodSandBox 19m (x2 over 23m) kubelet, kube03 Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/889d732df3746c0516c9d8616b5b96046911b0ee6593ec21db5f3121f3a26046: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/571392aa28b5a762e58043bb2b6e3e3683c9a5336a76112c2cd75eeab1ef7564: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/9544ecb695bdd48ce0b4f580d44514c38ab1209f42c64fbb19266d40a49f7579: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/183eba1061d3a3547c23dec81814650c419a04a4b1d52a2dab8c0e27c823eb1e: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/f12e5320b522996ed4937ad3ec64e255fe53125a61941e28541110bdb070bf68: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/208a57ca1209b449b577d7244078a27f8235f776fdc4e42cb770cc3dfb93f427: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/c87965c136f9c033e716234c6f658e0350cbfaa24f3541399eec300e800ad062: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 15m kubelet, kube03 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/76e1d121d2aedd44c3652fd285428241770a7ae2c46dc26bab853c05a025c84b: dial tcp 127.0.0.1:6784: connect: connection refused
Warning FailedCreatePodSandBox 11m (x4 over 15m) kubelet, kube03 (combined from similar events): Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal SandboxChanged 7m (x28 over 23m) kubelet, kube03 Pod sandbox changed, it will be killed and re-created.
Some additional info. Kube03 looks identical to kube02 exept the IP 10.0.1.3.
root@kube01 ~ # ping -c 2 10.0.1.2
PING 10.0.1.2 (10.0.1.2) 56(84) bytes of data.
64 bytes from 10.0.1.2: icmp_seq=1 ttl=64 time=0.673 ms
64 bytes from 10.0.1.2: icmp_seq=2 ttl=64 time=0.667 ms
--- 10.0.1.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.667/0.670/0.673/0.003 ms
root@kube01 ~ # ping -c 2 10.0.1.3
PING 10.0.1.3 (10.0.1.3) 56(84) bytes of data.
64 bytes from 10.0.1.3: icmp_seq=1 ttl=64 time=0.770 ms
64 bytes from 10.0.1.3: icmp_seq=2 ttl=64 time=0.701 ms
--- 10.0.1.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.701/0.735/0.770/0.043 ms
root@kube01 ~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.31.1.1 0.0.0.0 UG 0 0 0 eth0
10.0.1.2 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
10.0.1.3 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
10.32.0.0 0.0.0.0 255.240.0.0 U 0 0 0 weave
10.96.0.0 0.0.0.0 255.255.0.0 U 0 0 0 wg0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.31.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
root@kube02 ~ # ping -c 2 10.0.1.1
PING 10.0.1.1 (10.0.1.1) 56(84) bytes of data.
64 bytes from 10.0.1.1: icmp_seq=1 ttl=64 time=0.817 ms
64 bytes from 10.0.1.1: icmp_seq=2 ttl=64 time=0.775 ms
--- 10.0.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.775/0.796/0.817/0.021 ms
root@kube02 ~ # netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.31.1.1 0.0.0.0 UG 0 0 0 eth0
10.0.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
10.0.1.3 0.0.0.0 255.255.255.255 UH 0 0 0 wg0
10.96.0.0 0.0.0.0 255.255.0.0 U 0 0 0 wg0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.31.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
weave-net
agent doesn't seem to run on kube02, at least the route was not added.
You can check this with:
$ kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
kube-apiserver-kube1 1/1 Running 6 41d 10.0.1.1 kube1
kube-controller-manager-kube1 1/1 Running 1 41d 10.0.1.1 kube1
kube-dns-86f4d74b45-5cl7j 3/3 Running 3 41d 10.32.0.9 kube1
kube-proxy-cjzzl 1/1 Running 3 41d 10.0.1.3 kube3
kube-proxy-pz4qb 1/1 Running 1 41d 10.0.1.1 kube1
kube-proxy-tfhct 1/1 Running 1 41d 10.0.1.2 kube2
kube-scheduler-kube1 1/1 Running 1 41d 10.0.1.1 kube1
weave-net-qcxlk 2/2 Running 9 41d 10.0.1.3 kube3
weave-net-w7z68 2/2 Running 3 41d 10.0.1.1 kube1
weave-net-w9tfj 2/2 Running 4 41d 10.0.1.2 kube2
Okay.. I installed a new cluster today.
Have the same behavior. If I disable the firewall with "ufw default allow incoming && ufw reload" it works.
So there must be a missing rule :( Unfortunately I don't know which one :P
One more strange thing...
root@kube01:~/kubeconf# kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
kube-apiserver-kube01 1/1 Running 0 27m 46.101.xxx.xx3 kube01
kube-controller-manager-kube01 1/1 Running 0 27m 46.101.xxx.xx3 kube01
kube-dns-86f4d74b45-z4z9k 3/3 Running 0 28m 10.32.0.2 kube01
kube-proxy-828xk 1/1 Running 0 24m 46.101.xxx.xx0 kube02
kube-proxy-fgxxs 1/1 Running 0 24m 167.99.xxx,xx9 kube03
kube-proxy-rj22s 1/1 Running 0 28m 46.101.xxx.xx3 kube01
kube-scheduler-kube01 1/1 Running 0 27m 46.101.xxx.xx3 kube01
weave-net-2qrk5 2/2 Running 0 24m 167.99.xxx,xx9 kube03
weave-net-vt79f 2/2 Running 0 24m 46.101.xxx.xx0 kube02
weave-net-z7br4 2/2 Running 0 26m 46.101.xxx.xx3 kube01
On your example you can see the private IPs of your hosts. On my cluster there are the public one :/ Why this happen?
# /tmp/master-configuration.yml
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
advertiseAddress: 10.0.1.1
apiServerExtraArgs:
service-node-port-range: 7000-20000
etcd:
endpoints:
- http://10.0.1.1:2379
- http://10.0.1.2:2379
- http://10.0.1.3:2379
apiServerCertSANs:
- 46.101.xxx.xx3
This is strange indeed. I‘ve never run into this problem before. Did you try provisioning using Terraform?
No. I don‘t know how to do that with terraform. I never used it before and I can’t follow your guide. It’s to highlevel to me. Funny is that the kubeadm join command has the private IP to connect. :-(
I was running into the same problem before, the solution to me was to add the --node-ip
flag to the kubelet service configuration (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
).
You need to add the following line to the 10-kubeadm.conf
and change the address to the specific wireguard address on each host. After that restart the kubelet service:
Environment="KUBELET_EXTRA_ARGS=--node-ip=10.0.1.1"
Full Example:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
Environment="KUBELET_EXTRA_ARGS=--node-ip=10.0.1.1"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS
@kgierke works perfect for me too! :D Thank you!
Also re-enabled ufw firewall now.
So... the only thing for me is to get traefik deamonset running with LE certs ;)
@vistalba : let me know how it goes with Traefik and LE and please share some config. I did not manage to get it working. It did not bound to port 443 - probably a configuration issue.
Also, could you please check the load on the cluster when idle. I have a single node cluster on a hetzner CX51 VM and I get 0.5 system load just after I install kubernets.
All details are here kubernetes/kubernetes#63951 . I would love to know if anyone has the same issues as me.