Issues with cluster on Hetzner cloud - Pods stuck in "creating container"

Question

Issues with cluster on Hetzner cloud - Pods stuck in "creating container"

vistalba opened this issue 6 years ago · comments

Hi together

First: Thank you for this nice hobby-kube!!! :-)

I build one on 3 hetzner cloud vm's today. (two times, first on ubuntu 18.04 then on ubuntu 16.04)
I used this guide https://github.com/hobby-kube/guide to build it manually.

If I deploy something it hangs on "ContainerCreating" and after some time I can see the error message:
Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/76e1d121d2aedd44c3652fd285428241770a7ae2c46dc26bab853c05a025c84b: dial tcp 127.0.0.1:6784: connect: connection refused

Maybe I did something wrong or is there something missing in the guide?

Any help is much appreciated.

Some output... hope it helps:

root@kube01 ~/deployments # kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
kube01    Ready     master    33m       v1.10.2
kube02    Ready     <none>    29m       v1.10.2
kube03    Ready     <none>    29m       v1.10.2

root@kube01 ~/deployments # kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY     STATUS              RESTARTS   AGE
kube-system   kube-apiserver-kube01                   1/1       Running             0          32m
kube-system   kube-controller-manager-kube01          1/1       Running             0          33m
kube-system   kube-dns-86f4d74b45-r9sbg               3/3       Running             0          33m
kube-system   kube-proxy-4cg67                        1/1       Running             0          30m
kube-system   kube-proxy-m7nmc                        1/1       Running             0          33m
kube-system   kube-proxy-xc729                        1/1       Running             0          30m
kube-system   kube-scheduler-kube01                   1/1       Running             0          33m
kube-system   kubernetes-dashboard-7f87cb5646-6qfp7   0/1       ContainerCreating   0          26m
kube-system   weave-net-kkbvj                         2/2       Running             0          6m
kube-system   weave-net-p2q5s                         2/2       Running             0          6m
kube-system   weave-net-sw7tz                         2/2       Running             0          6m

root@kube01 ~/deployments # kubectl describe pod -n kube-system kubernetes-dashboard-7f87cb5646-6qfp7
Name:           kubernetes-dashboard-7f87cb5646-6qfp7
Namespace:      kube-system
Node:           kube03/88.198.93.160
Start Time:     Tue, 08 May 2018 22:17:29 +0200
Labels:         app=kubernetes-dashboard
                pod-template-hash=3943761202
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  ReplicaSet/kubernetes-dashboard-7f87cb5646
Containers:
  kubernetes-dashboard:
    Container ID:
    Image:          gcr.io/google_containers/kubernetes-dashboard-amd64:v1.8.3
    Image ID:
    Port:           9090/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lvlj7 (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  default-token-lvlj7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lvlj7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   SuccessfulMountVolume   27m                kubelet, kube03    MountVolume.SetUp succeeded for volume "default-token-lvlj7"
  Normal   Scheduled               27m                default-scheduler  Successfully assigned kubernetes-dashboard-7f87cb5646-6qfp7 to kube03
  Warning  FailedCreatePodSandBox  19m (x2 over 23m)  kubelet, kube03    Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/889d732df3746c0516c9d8616b5b96046911b0ee6593ec21db5f3121f3a26046: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/571392aa28b5a762e58043bb2b6e3e3683c9a5336a76112c2cd75eeab1ef7564: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/9544ecb695bdd48ce0b4f580d44514c38ab1209f42c64fbb19266d40a49f7579: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/183eba1061d3a3547c23dec81814650c419a04a4b1d52a2dab8c0e27c823eb1e: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/f12e5320b522996ed4937ad3ec64e255fe53125a61941e28541110bdb070bf68: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/208a57ca1209b449b577d7244078a27f8235f776fdc4e42cb770cc3dfb93f427: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/c87965c136f9c033e716234c6f658e0350cbfaa24f3541399eec300e800ad062: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  15m                kubelet, kube03    Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "kubernetes-dashboard-7f87cb5646-6qfp7_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/76e1d121d2aedd44c3652fd285428241770a7ae2c46dc26bab853c05a025c84b: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  FailedCreatePodSandBox  11m (x4 over 15m)  kubelet, kube03    (combined from similar events): Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   SandboxChanged          7m (x28 over 23m)  kubelet, kube03    Pod sandbox changed, it will be killed and re-created.

vistalba · Answer 1 · Wed May 09 2018 13:47:57 GMT+0800 (China Standard Time)

Some additional info. Kube03 looks identical to kube02 exept the IP 10.0.1.3.

root@kube01 ~ # ping -c 2 10.0.1.2
PING 10.0.1.2 (10.0.1.2) 56(84) bytes of data.
64 bytes from 10.0.1.2: icmp_seq=1 ttl=64 time=0.673 ms
64 bytes from 10.0.1.2: icmp_seq=2 ttl=64 time=0.667 ms

--- 10.0.1.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.667/0.670/0.673/0.003 ms
root@kube01 ~ # ping -c 2 10.0.1.3
PING 10.0.1.3 (10.0.1.3) 56(84) bytes of data.
64 bytes from 10.0.1.3: icmp_seq=1 ttl=64 time=0.770 ms
64 bytes from 10.0.1.3: icmp_seq=2 ttl=64 time=0.701 ms

--- 10.0.1.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.701/0.735/0.770/0.043 ms

root@kube01 ~ # netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         172.31.1.1      0.0.0.0         UG        0 0          0 eth0
10.0.1.2        0.0.0.0         255.255.255.255 UH        0 0          0 wg0
10.0.1.3        0.0.0.0         255.255.255.255 UH        0 0          0 wg0
10.32.0.0       0.0.0.0         255.240.0.0     U         0 0          0 weave
10.96.0.0       0.0.0.0         255.255.0.0     U         0 0          0 wg0
172.17.0.0      0.0.0.0         255.255.0.0     U         0 0          0 docker0
172.31.1.1      0.0.0.0         255.255.255.255 UH        0 0          0 eth0

root@kube02 ~ # ping -c 2 10.0.1.1
PING 10.0.1.1 (10.0.1.1) 56(84) bytes of data.
64 bytes from 10.0.1.1: icmp_seq=1 ttl=64 time=0.817 ms
64 bytes from 10.0.1.1: icmp_seq=2 ttl=64 time=0.775 ms

--- 10.0.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.775/0.796/0.817/0.021 ms

root@kube02 ~ # netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         172.31.1.1      0.0.0.0         UG        0 0          0 eth0
10.0.1.1        0.0.0.0         255.255.255.255 UH        0 0          0 wg0
10.0.1.3        0.0.0.0         255.255.255.255 UH        0 0          0 wg0
10.96.0.0       0.0.0.0         255.255.0.0     U         0 0          0 wg0
172.17.0.0      0.0.0.0         255.255.0.0     U         0 0          0 docker0
172.31.1.1      0.0.0.0         255.255.255.255 UH        0 0          0 eth0

Patrick Stadler · Answer 2 · Sat May 12 2018 00:11:56 GMT+0800 (China Standard Time)

weave-net agent doesn't seem to run on kube02, at least the route was not added.

You can check this with:

$ kubectl -n kube-system get pods -o wide
NAME                                    READY     STATUS    RESTARTS   AGE       IP           NODE
kube-apiserver-kube1                    1/1       Running   6          41d       10.0.1.1     kube1
kube-controller-manager-kube1           1/1       Running   1          41d       10.0.1.1     kube1
kube-dns-86f4d74b45-5cl7j               3/3       Running   3          41d       10.32.0.9    kube1
kube-proxy-cjzzl                        1/1       Running   3          41d       10.0.1.3     kube3
kube-proxy-pz4qb                        1/1       Running   1          41d       10.0.1.1     kube1
kube-proxy-tfhct                        1/1       Running   1          41d       10.0.1.2     kube2
kube-scheduler-kube1                    1/1       Running   1          41d       10.0.1.1     kube1
weave-net-qcxlk                         2/2       Running   9          41d       10.0.1.3     kube3
weave-net-w7z68                         2/2       Running   3          41d       10.0.1.1     kube1
weave-net-w9tfj                         2/2       Running   4          41d       10.0.1.2     kube2

vistalba · Answer 3 · Sat May 12 2018 23:49:08 GMT+0800 (China Standard Time)

Okay.. I installed a new cluster today.
Have the same behavior. If I disable the firewall with "ufw default allow incoming && ufw reload" it works.
So there must be a missing rule :( Unfortunately I don't know which one :P

vistalba · Answer 4 · Sun May 13 2018 00:07:35 GMT+0800 (China Standard Time)

One more strange thing...

root@kube01:~/kubeconf# kubectl -n kube-system get pods -o wide
NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-apiserver-kube01            1/1       Running   0          27m       46.101.xxx.xx3   kube01
kube-controller-manager-kube01   1/1       Running   0          27m       46.101.xxx.xx3   kube01
kube-dns-86f4d74b45-z4z9k        3/3       Running   0          28m       10.32.0.2        kube01
kube-proxy-828xk                 1/1       Running   0          24m       46.101.xxx.xx0    kube02
kube-proxy-fgxxs                 1/1       Running   0          24m       167.99.xxx,xx9    kube03
kube-proxy-rj22s                 1/1       Running   0          28m       46.101.xxx.xx3   kube01
kube-scheduler-kube01            1/1       Running   0          27m       46.101.xxx.xx3   kube01
weave-net-2qrk5                  2/2       Running   0          24m       167.99.xxx,xx9    kube03
weave-net-vt79f                  2/2       Running   0          24m       46.101.xxx.xx0    kube02
weave-net-z7br4                  2/2       Running   0          26m       46.101.xxx.xx3   kube01

On your example you can see the private IPs of your hosts. On my cluster there are the public one :/ Why this happen?

# /tmp/master-configuration.yml
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
  advertiseAddress: 10.0.1.1
apiServerExtraArgs:
  service-node-port-range: 7000-20000
etcd:
  endpoints:
  - http://10.0.1.1:2379
  - http://10.0.1.2:2379
  - http://10.0.1.3:2379
apiServerCertSANs:
  - 46.101.xxx.xx3

Patrick Stadler · Answer 5 · Sun May 13 2018 04:11:02 GMT+0800 (China Standard Time)

This is strange indeed. I‘ve never run into this problem before. Did you try provisioning using Terraform?

vistalba · Answer 6 · Sun May 13 2018 04:21:40 GMT+0800 (China Standard Time)

No. I don‘t know how to do that with terraform. I never used it before and I can’t follow your guide. It’s to highlevel to me. Funny is that the kubeadm join command has the private IP to connect. :-(

Kurt Gierke · Answer 7 · Sun May 13 2018 05:04:05 GMT+0800 (China Standard Time)

I was running into the same problem before, the solution to me was to add the --node-ip flag to the kubelet service configuration (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf).

You need to add the following line to the 10-kubeadm.conf and change the address to the specific wireguard address on each host. After that restart the kubelet service:

Environment="KUBELET_EXTRA_ARGS=--node-ip=10.0.1.1"

Full Example:

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
Environment="KUBELET_EXTRA_ARGS=--node-ip=10.0.1.1"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS

vistalba · Answer 8 · Sun May 13 2018 18:43:39 GMT+0800 (China Standard Time)

@kgierke works perfect for me too! :D Thank you!
Also re-enabled ufw firewall now.

So... the only thing for me is to get traefik deamonset running with LE certs ;)

Eugen Stan · Answer 9 · Thu May 17 2018 20:18:38 GMT+0800 (China Standard Time)

@vistalba : let me know how it goes with Traefik and LE and please share some config. I did not manage to get it working. It did not bound to port 443 - probably a configuration issue.

Also, could you please check the load on the cluster when idle. I have a single node cluster on a hetzner CX51 VM and I get 0.5 system load just after I install kubernets.

All details are here kubernetes/kubernetes#63951 . I would love to know if anyone has the same issues as me.