projectcalico / canal

Policy based networking for cloud native applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Canal on GCE using CoreOS does not work

mattymo opened this issue · comments

Expected Behavior

Pods should be able to ping each other

Current Behavior

All services report healthy, but flannel container in canal pod shows the following type of error:

Possible Solution

Steps to Reproduce (for bugs)

  1. Create CoreOS instances in GCE
  2. Set up an ansible inventory such as this:
k8s-mattymo-test2-1 ansible_ssh_host=104.199.90.98
k8s-mattymo-test2-2 ansible_ssh_host=35.195.129.28
[kube-master]
k8s-mattymo-test2-1

[kube-node]
k8s-mattymo-test2-2

[etcd]
k8s-mattymo-test2-1

[k8s-cluster:children]
kube-node
kube-master
  1. Run ansible with -e kube_network_plugin=canal
  2. Try to ping pod IPs from any host or from any pod to another.

Context

Pod logs:
I don't have full flannel logs at the moment, but this type of message repeats constantly:
5 vxlan_network.go:241] L3 miss but route for 10.233.95.3 not found

calico-node http://paste.openstack.org/show/2R0SriTdthfMATf8t46V/
policy controller http://paste.openstack.org/show/ndCEAIAmYFmstYx2Hkxu/
endpoints http://paste.openstack.org/show/1im9g356CrPgFBT14f9o/
profile http://paste.openstack.org/show/OTVomtNV8CWn1Ikcp45m/

Your Environment

  • Calico version: v2.5.0
  • Flannel version: v0.8.0
  • Orchestrator version: Kubespray from master
  • Operating System and version: CoreOS stable (latest from GCE)
  • Link to your project (optional): github.com/kubernetes-incubator/kubespray

More details:
CoreOS + Canal works fine on vagrant
CoreOS + Flannel works fine on all platforms (including GCE)
CoreOS + Calico works fine on all platforms (including GCE)
Ubuntu and CentOS + Flannel works fine on GCE

I tried changing the backend from vxlan to gce, but no change in behavior.

The actual canal manifest being used: https://github.com/kubernetes-incubator/kubespray/blob/master/roles/network_plugin/canal/templates/canal-node.yaml.j2

commented

I have the same issue.

k8s v1.8.1 + canal (1.7/canal.yaml)

  • quay.io/calico/node:v2.6.1
  • quay.io/calico/cni:v1.10.0
  • quay.io/coreos/flannel:v0.8.0

Sounds like this issue: flannel-io/flannel#427
Looks like some vxlan improvements have made it into Flannel v0.9.0.

I'll look at getting a canal release ASAP that includes Flannel v0.9.0. Hopefully should resolve this problem.

The new canal manifests (which have been moved to https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/hosted/canal/) now include Flannel v0.9.1 so I'm going to close this issue.

I see that kubespray has bumped to v0.9 as well, so if anyone is still hitting this there make sure you've updated.

I suggest anyone still hitting this issue on Flannel v0.9+ please open a new issue instead of commenting here.

Thanks!

Confirmed: Flannel v0.9 fixes our issues.