Remote node flannel failed to operate
moonek opened this issue · comments
Expected Behavior
Successful canal pod
Current Behavior
kubectl get po -n kube-system -o wide
kNAME READY STATUS RESTARTS AGE IP NODE
canal-7pplx 2/3 CrashLoopBackOff 12 46m 172.17.8.102 172.17.8.102
canal-dwfp8 2/3 CrashLoopBackOff 12 46m 172.17.8.103 172.17.8.103
canal-l84s1 3/3 Running 0 46m 172.17.8.101 172.17.8.101
kube-apiserver-172.17.8.101 1/1 Running 0 1h 172.17.8.101 172.17.8.101
kube-controller-manager-172.17.8.101 1/1 Running 0 1h 172.17.8.101 172.17.8.101
kube-proxy-172.17.8.101 1/1 Running 0 1h 172.17.8.101 172.17.8.101
kube-proxy-172.17.8.102 1/1 Running 0 1h 172.17.8.102 172.17.8.102
kube-proxy-172.17.8.103 1/1 Running 0 1h 172.17.8.103 172.17.8.103
kube-scheduler-172.17.8.101 1/1 Running 0 1h 172.17.8.101 172.17.8.101
Steps to Reproduce (for bugs)
- coreos + k8s cluster install
- kubelet configuration (--cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --network-plugin=cni )
- controller manager configuration (--cluster-cidr=10.244.0.0/16 --allocate-node-cidrs=true)
- wget https://raw.githubusercontent.com/projectcalico/canal/master/k8s-install/canal.yaml
- kubectl apply -f canal.yaml
Context
Tested with local virtualbox.
The canal on the same node as apiserver will run normally. (3/3 Running)
Canal running on remote node failed. (2/3 CrashLoopBackOff)
Looking at the flannel log, the output looks like this:
kubectl logs -f canal-dwfp8 -n kube-system -c kube-flannel
I0712 06:20:19.618913 1 main.go:459] Using interface with name eth1 and address 172.17.8.103
I0712 06:20:19.619093 1 main.go:476] Defaulting external address to interface address (172.17.8.103)
E0712 06:20:49.622290 1 main.go:223] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/canal-dwfp8': Get https://10.3.0.1:443/api/v1/namespaces/kube-system/pods/canal-dwfp8: dial tcp 10.3.0.1:443: i/o timeout
Changing canal.yaml is the same. ("k8s_api_root": "https://172.17.8.101:443")
There is no overlay network when the flannel is running.
But how does the flannel communicate with the 10.3.0.1 address?
Your Environment
- Vagrant + Virtualbox (MASTER: 172.17.8.101, WORKER: 172.17.8.102, 172.17.8.103)
- Calico version: 1.2.1
- Flannel version: 0.8.0
- Orchestrator version: k8s 1.6.4 (no rbac mode)
- Operating System and version: Container Linux by CoreOS 1437.0.0
I'm guessing that 10.3.0.1 is the service address for the kubernetes service and kube-proxy should be setting up iptables rules that will do DNAT when something is sent to the 10.3.0.1 IP address and change the packet destination address to 172.17.8.101. You can check that is true by checking kubectl get services --all-namespaces
.
You should check that from a worker that you are able to curl https://10.3.0.1
, in a test cluster I was able to connect with curl but (unsurprisingly) I received curl: (60) SSL certificate problem: self signed certificate...
. I'm imagining that isn't working though so you should also try curl https://172.17.8.101:443
. If that doesn't work either you should try pinging between the hosts to verify that some type of communication is possible from the worker to the master.
I brought up a test cluster using the Vagrantfile and master/node-config.yaml here http://docs.projectcalico.org/v2.3/getting-started/kubernetes/installation/vagrant/, only updating the --cluster-cidr
for the controller manager and was able to bring up canal with the manifest you linked. Though there have been updates to it since you opened this issue. You might just want to try installing canal again with the latest, that may resolve the issue.
I have been experiencing a similar issue, though some differences in my setup are k8s 1.7.x with RBAC.
What resolved the issue for me was adding to the kube-apiserver
the flag:
--advertise-address=172.17.8.101
Note: k8s_api_root
still resolved to the 10.x.x.x
address, but flannel and calico both were able to communicate with the master node.
Hope that helps!
@Blfrg Thanks for reporting your solution. 👍
Closing this issue as stale.