projectcalico / canal

Policy based networking for cloud native applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Remote node flannel failed to operate

moonek opened this issue · comments

Expected Behavior

Successful canal pod

Current Behavior

kubectl get po -n kube-system -o wide
kNAME                                   READY     STATUS             RESTARTS   AGE       IP             NODE
canal-7pplx                            2/3       CrashLoopBackOff   12         46m       172.17.8.102   172.17.8.102
canal-dwfp8                            2/3       CrashLoopBackOff   12         46m       172.17.8.103   172.17.8.103
canal-l84s1                            3/3       Running            0          46m       172.17.8.101   172.17.8.101
kube-apiserver-172.17.8.101            1/1       Running            0          1h        172.17.8.101   172.17.8.101
kube-controller-manager-172.17.8.101   1/1       Running            0          1h        172.17.8.101   172.17.8.101
kube-proxy-172.17.8.101                1/1       Running            0          1h        172.17.8.101   172.17.8.101
kube-proxy-172.17.8.102                1/1       Running            0          1h        172.17.8.102   172.17.8.102
kube-proxy-172.17.8.103                1/1       Running            0          1h        172.17.8.103   172.17.8.103
kube-scheduler-172.17.8.101            1/1       Running            0          1h        172.17.8.101   172.17.8.101

Steps to Reproduce (for bugs)

  1. coreos + k8s cluster install
  2. kubelet configuration (--cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni )
  3. controller manager configuration (--cluster-cidr=10.244.0.0/16 --allocate-node-cidrs=true)
  4. wget https://raw.githubusercontent.com/projectcalico/canal/master/k8s-install/canal.yaml
  5. kubectl apply -f canal.yaml

Context

Tested with local virtualbox.
The canal on the same node as apiserver will run normally. (3/3 Running)
Canal running on remote node failed. (2/3 CrashLoopBackOff)
Looking at the flannel log, the output looks like this:

kubectl logs -f canal-dwfp8 -n kube-system -c kube-flannel
I0712 06:20:19.618913       1 main.go:459] Using interface with name eth1 and address 172.17.8.103
I0712 06:20:19.619093       1 main.go:476] Defaulting external address to interface address (172.17.8.103)
E0712 06:20:49.622290       1 main.go:223] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/canal-dwfp8': Get https://10.3.0.1:443/api/v1/namespaces/kube-system/pods/canal-dwfp8: dial tcp 10.3.0.1:443: i/o timeout

Changing canal.yaml is the same. ("k8s_api_root": "https://172.17.8.101:443")
There is no overlay network when the flannel is running.
But how does the flannel communicate with the 10.3.0.1 address?

Your Environment

  • Vagrant + Virtualbox (MASTER: 172.17.8.101, WORKER: 172.17.8.102, 172.17.8.103)
  • Calico version: 1.2.1
  • Flannel version: 0.8.0
  • Orchestrator version: k8s 1.6.4 (no rbac mode)
  • Operating System and version: Container Linux by CoreOS 1437.0.0

I'm guessing that 10.3.0.1 is the service address for the kubernetes service and kube-proxy should be setting up iptables rules that will do DNAT when something is sent to the 10.3.0.1 IP address and change the packet destination address to 172.17.8.101. You can check that is true by checking kubectl get services --all-namespaces.

You should check that from a worker that you are able to curl https://10.3.0.1, in a test cluster I was able to connect with curl but (unsurprisingly) I received curl: (60) SSL certificate problem: self signed certificate.... I'm imagining that isn't working though so you should also try curl https://172.17.8.101:443. If that doesn't work either you should try pinging between the hosts to verify that some type of communication is possible from the worker to the master.

I brought up a test cluster using the Vagrantfile and master/node-config.yaml here http://docs.projectcalico.org/v2.3/getting-started/kubernetes/installation/vagrant/, only updating the --cluster-cidr for the controller manager and was able to bring up canal with the manifest you linked. Though there have been updates to it since you opened this issue. You might just want to try installing canal again with the latest, that may resolve the issue.

commented

I have been experiencing a similar issue, though some differences in my setup are k8s 1.7.x with RBAC.

What resolved the issue for me was adding to the kube-apiserver the flag:
--advertise-address=172.17.8.101

Note: k8s_api_root still resolved to the 10.x.x.x address, but flannel and calico both were able to communicate with the master node.

Hope that helps!

@Blfrg Thanks for reporting your solution. 👍

@moonek have you been able to try @Blfrg's solution or found another solution for your issue?

Closing this issue as stale.