projectcalico / canal

Policy based networking for cloud native applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kubernetes master node is 'untainted' when applying Canal CNI

RichWellum opened this issue · comments

Expected Behavior

I'm running kubernetes on a Centos 7.x VM, the purpose is to create an AIO to run kolla OpenStack images on. Typically, after I apply the canal.yaml CNI, I then have to mark the master node as scheduleable by 'untainting' the node. This allows you to use kubernetes as an AIO - creating OpenStack services on the one node. The command is:
kubectl taint nodes --all=true node-role.kubernetes.io/master:NoSchedule-

Current Behavior

In the last few days I see that after applying canal the taint is removed, here's my log:

[rwellum@kolla-k8s k8s]$ kubectl get nodes
NAME STATUS AGE VERSION
kolla-k8s NotReady 33s v1.6.4

#Taint is there on initial bring-up:
[rwellum@kolla-k8s k8s]$ kubectl describe node kolla-k8s | grep -i taint
Taints: node-role.kubernetes.io/master:NoSchedule

[rwellum@kolla-k8s k8s]$ # Now I will apply canal.yaml and check the node again:
[rwellum@kolla-k8s k8s]$ kubectl describe node kolla-k8s | grep -i taint
Taints:
[rwellum@kolla-k8s k8s]$ # Weirdly the Taint is gone...

[rwellum@kolla-k8s k8s]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system canal-jwnp8 3/3 Running 0 28s
kube-system etcd-kolla-k8s 1/1 Running 0 1m
kube-system kube-apiserver-kolla-k8s 1/1 Running 0 1m
kube-system kube-controller-manager-kolla-k8s 1/1 Running 0 1m
kube-system kube-dns-3913472980-6lpm5 0/3 Pending 0 1m
kube-system kube-proxy-w1kf3 1/1 Running 0 1m
kube-system kube-scheduler-kolla-k8s 1/1 Running 0 1m
[rwellum@kolla-k8s k8s]$

Possible Solution

Steps to Reproduce (for bugs)

  1. I am following this deployment guide: https://docs.openstack.org/developer/kolla-kubernetes/deployment-guide.html
  2. Before applying canal (kubectl apply -f canal.yaml) - as my log above shows - check the taint
  3. Apply canal and then check the taint again - it should be gone

Context

For me as I am running an AIO - it has no effect other than a warning in my logs that the taint doesn't exist. However for a multi-node deployment or a production deployment this would be an issue.

Your Environment

I'm also seeing this, it seems to be an issue with Calico. I don't have this issue when just running flannel on it's own.

Flannel fixed this with the following: flannel-io/flannel#667

Edit: Calico on it's own seems to work, seems to just be canal :\

Thanks for replying and pointing out the flannel fix. Hoping someone will look at this!

Thank for raising - I think this might be a bug in the Calico kubernetes datastore driver.

@heschlie could you try to repro and determine where the problem is?

Taking a look

This looks to be fixed by updating calico/node from v1.2.1 to v1.3.0, I'll need to dig further in to see as to what changed that seemed to resolve this so we can ensure it doesn't break again.

I'll open a PR to update the manifests to the latest versions of Calico, in the mean time can you update your manifest locally and verify it fixes it for you as well? You'll probably want to update calico/cni to v1.9.1 as well while you are there.

Going to close this for now, but please shout if upgrading to the latest manifests does not fix this issue. Thanks!

I think I am already using these versions as I download from canal/master:

$ curl -O https://raw.githubusercontent.com/projectcalico/canal/master/k8s-install/1.6/canal.yaml
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6474 100 6474 0 0 19755 0 --:--:-- --:--:-- --:--:-- 19798
$ cat canal.yaml | grep v1
apiVersion: v1
apiVersion: extensions/v1beta1
image: quay.io/calico/node:v1.3.0
image: quay.io/calico/cni:v1.9.1
apiVersion: v1

I just checked and as you said it appears to be working now. Thanks!

@RichWellum I merged #88 a few hours ago, so you're now getting the newer versions.

@heschlie - many thanks for the speedy work. By the way, I had this issue open for a while, for my education, is there anything I should have done to raise attention to it?

@RichWellum I think you did the right thing. We simply missed this issue I believe.
Always feel free to "bump" an issue if it isn't getting the action you think it should or come bug us in calicousers.slack.com. Actually this may be a gap in our documentation, we could include some notes about submitting issues and the expected process. I just added #89 to address that.

Thank you very much. I should have thought about the slack channel - too used to OpenStack and IRC... :)