Canal on GCE using CoreOS does not work
mattymo opened this issue · comments
Expected Behavior
Pods should be able to ping each other
Current Behavior
All services report healthy, but flannel container in canal pod shows the following type of error:
Possible Solution
Steps to Reproduce (for bugs)
- Create CoreOS instances in GCE
- Set up an ansible inventory such as this:
k8s-mattymo-test2-1 ansible_ssh_host=104.199.90.98
k8s-mattymo-test2-2 ansible_ssh_host=35.195.129.28
[kube-master]
k8s-mattymo-test2-1
[kube-node]
k8s-mattymo-test2-2
[etcd]
k8s-mattymo-test2-1
[k8s-cluster:children]
kube-node
kube-master
- Run ansible with -e kube_network_plugin=canal
- Try to ping pod IPs from any host or from any pod to another.
Context
Pod logs:
I don't have full flannel logs at the moment, but this type of message repeats constantly:
5 vxlan_network.go:241] L3 miss but route for 10.233.95.3 not found
calico-node http://paste.openstack.org/show/2R0SriTdthfMATf8t46V/
policy controller http://paste.openstack.org/show/ndCEAIAmYFmstYx2Hkxu/
endpoints http://paste.openstack.org/show/1im9g356CrPgFBT14f9o/
profile http://paste.openstack.org/show/OTVomtNV8CWn1Ikcp45m/
Your Environment
- Calico version: v2.5.0
- Flannel version: v0.8.0
- Orchestrator version: Kubespray from master
- Operating System and version: CoreOS stable (latest from GCE)
- Link to your project (optional): github.com/kubernetes-incubator/kubespray
More details:
CoreOS + Canal works fine on vagrant
CoreOS + Flannel works fine on all platforms (including GCE)
CoreOS + Calico works fine on all platforms (including GCE)
Ubuntu and CentOS + Flannel works fine on GCE
I tried changing the backend from vxlan to gce, but no change in behavior.
The actual canal manifest being used: https://github.com/kubernetes-incubator/kubespray/blob/master/roles/network_plugin/canal/templates/canal-node.yaml.j2
I have the same issue.
k8s v1.8.1 + canal (1.7/canal.yaml)
- quay.io/calico/node:v2.6.1
- quay.io/calico/cni:v1.10.0
- quay.io/coreos/flannel:v0.8.0
Sounds like this issue: flannel-io/flannel#427
Looks like some vxlan improvements have made it into Flannel v0.9.0.
I'll look at getting a canal release ASAP that includes Flannel v0.9.0. Hopefully should resolve this problem.
The new canal manifests (which have been moved to https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/hosted/canal/) now include Flannel v0.9.1 so I'm going to close this issue.
I see that kubespray has bumped to v0.9 as well, so if anyone is still hitting this there make sure you've updated.
I suggest anyone still hitting this issue on Flannel v0.9+ please open a new issue instead of commenting here.
Thanks!
Confirmed: Flannel v0.9 fixes our issues.