Only adding private repository to pod resource cause the deployment resource create pod failed from pod template.

Question

Only adding private repository to pod resource cause the deployment resource create pod failed from pod template.

lyyao09 opened this issue 5 years ago · comments

First, deploying kubenab in the kubernetes cluster can solve the problem that the image of the yaml file does not have a private repository. eg:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-pod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: deployment
  template:
    metadata:
      labels:
        app: deployment
    spec:
      containers:
      - image: busybox（kubenab will add private repository automatically when create yaml）
        name: busybox

However, I found that there is a problem in the case of power off the node.

Suppose test-pod first scheduled on node1(kubenab is also scheduled on the node1). If node1 is powered off, test-pod and kubenab will migrate to other nodes after a while.

If the test-pod migrate first and kubenab is not ready, it will pull the image busybox:latest from default docker.io repository, and then the pod become ErrorImagePull（I have to use kubectl delete -f xxx.yaml cmd and kubectl create -f xxx.yaml after kubenab is ready to avoid this problem）.

kubectl describe pod deployment-test-pod-567d7cf9bf-xdfvg
Events:
  Type     Reason          Age               From               Message
  ----     ------          ----              ----               -------
  Normal   Scheduled       20s               default-scheduler  Successfully assigned default/deployment-test-pod-567d7cf9bf-xdfvg to node2
  Normal   Pulling         13s               kubelet, node1     Pulling image "busybox:latest"
  Warning  Failed          13s               kubelet, node1     Failed to pull image "busybox:latest": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:35116->[::1]:53: read: connection refused
  Warning  Failed          13s               kubelet, node1     Error: ErrImagePull

From kubenab/cmd/kubenab/admissions.go, kubenab only add repository to Pod resource, when pod created by deployment or statefulset's template, the above problem will occur because there is no private repository added to deployment or statefulset resource.

if !contains(whitelistedNamespaces, namespace) {
  pod := v1.Pod{}

Please confirm if there is any problem or my usage is wrong?（use official example）

Emanuel Bennici · Answer 1 · Wed Oct 09 2019 00:40:26 GMT+0800 (China Standard Time)

This issue could be solved by setting the Pod Priority to system-cluster-critical since kubenab is relevant that the Cluster can work correctly.

Emanuel Bennici · Answer 2 · Wed Oct 09 2019 01:02:04 GMT+0800 (China Standard Time)

I created a Pull-Request (#18) which targets your problem and solves.

I wasn't able to reproduce your problem with that patch applied.

Emanuel Bennici · Answer 3 · Wed Oct 09 2019 01:21:55 GMT+0800 (China Standard Time)

Please note that this Patch (#18) does not implement Fail-Over/High-Availability – it only sets the Scheduling Priority to the Highest possible Value so the kubenab Pod would be scheduled above all else Pods!
This is targeted by PR #12 (which is currently Work in Progress).

But you could also simply increase the Replica Count of the kubenab deployment and then add a nodeSelector to prevent scheduling all the pods on the same Node.

lyyao09 · Answer 4 · Wed Oct 09 2019 15:10:28 GMT+0800 (China Standard Time)

@l0nax, Thank you for your prompt reply.

According to the method you provided and found that the problem still exists when abnormal power off node.

I think the root cause is that only the image name in the pod object（Kind: Pod） is modified and persisted（Not include deployment or statefuelset）.

K8s version:

[root@node2 01-volume-emptyDir]# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:23:26Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Emanuel Bennici · Answer 5 · Wed Oct 09 2019 17:28:08 GMT+0800 (China Standard Time)

You are right.
The MutatingAdmissionController gets only called on CREATE Operations (see https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/mutating-webhook.yaml#L16).

I will push a change so also UPDATE Operations will be covered by the Admission Controller.

Thanks for your futture Tests @lyyao09

EDIT:
Adding UPDATE Operation to the Admission Controller could also prevent Issues if a user (eg.) changes only the Image Version of an Pod.

Emanuel Bennici · Answer 6 · Wed Oct 09 2019 17:29:20 GMT+0800 (China Standard Time)

@lyyao09 can you re-open the Issue please?

Emanuel Bennici · Answer 7 · Wed Oct 09 2019 18:28:54 GMT+0800 (China Standard Time)

And your Issue would be fixed via Fail-Over (whitch targets the PR #12) this would prevent issues of a Node gets a abnormal power off.

Editing Deployments/ StatefSets could be done, but they will create Pods and you could also create Pods without a Deployment or StatefulSet.

So it would sub-optimal to change them.
A pod is the "lowest" entity which will be created.

lyyao09 · Answer 8 · Wed Oct 09 2019 18:53:45 GMT+0800 (China Standard Time)

You are right.
The MutatingAdmissionController gets only called on CREATE Operations (see https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/mutating-webhook.yaml#L16).

I will push a change so also UPDATE Operations will be covered by the Admission Controller.

Thanks for your futture Tests @lyyao09

EDIT:
Adding UPDATE Operation to the Admission Controller could also prevent Issues if a user (eg.) changes only the Image Version of an Pod.

yes, we should add UPDATE operation as you say. However, this can't solve my problem.

Emanuel Bennici · Answer 9 · Wed Oct 09 2019 19:00:48 GMT+0800 (China Standard Time)

As said in #17 (comment) Fail-Over/ Replication is the only way to fix this problem directly.

And priority Classes are only to set the Scheduling Priority above all the other Pods. But kubernetes has a "delay" after which kubernetes marks a Nice as Unschedulable and NotRead

lyyao09 · Answer 10 · Thu Oct 10 2019 14:57:04 GMT+0800 (China Standard Time)

@l0nax, I simply increase the Replica Count of the kubenab deployment and then add a podAntiAffinity to prevent scheduling all the pods on the same Node.

Repeated abnormal power off node test, it works well.

Emanuel Bennici · Answer 11 · Thu Oct 10 2019 15:52:42 GMT+0800 (China Standard Time)

How does your podAffinity look like?

If you add the Node-Names hard-coded you prevent Kubernetes to dynamically schedule the Pods

Emanuel Bennici · Answer 12 · Thu Oct 10 2019 15:58:20 GMT+0800 (China Standard Time)

as far as I know kubernetes doesn't have a feature where you can say "deploy max 2 pods per node".

You could implement this feature by building a Operator which schedules the Pods dynamically.

lyyao09 · Answer 13 · Thu Oct 10 2019 20:28:56 GMT+0800 (China Standard Time)

How does your podAffinity look like?

If you add the Node-Names hard-coded you prevent Kubernetes to dynamically schedule the Pods

I make kubenab scheduled to 3 master node, and each node only have one kubenab pod.

      ...
      nodeSelector:
        role: master
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - kubenab
            topologyKey: kubernetes.io/hostname  
      ...

Emanuel Bennici · Answer 14 · Fri Oct 11 2019 17:15:21 GMT+0800 (China Standard Time)

@lyyao09 Sorry for the late answer. I didn't know about how to use podAntiAffinity for schedule only per node.

I will make a Change to implement that Feature - after understanding how really it works.

Thanks!

lyyao09 · Answer 15 · Fri Oct 11 2019 17:31:54 GMT+0800 (China Standard Time)

@l0nax Thank you for your enthusiastic answer.

Rimantas (Rimas) Mocevicius · Answer 16 · Fri Oct 11 2019 17:38:58 GMT+0800 (China Standard Time)

@l0nax regarding podAntiAffinity it is nothing to implement it is already there:
https://github.com/jfrog/kubenab/blob/master/chart/kubenab/values.yaml#L132
https://github.com/jfrog/kubenab/blob/master/chart/kubenab/templates/deployment.yaml#L57
:)
one of examples of adding it to override-values.yaml file:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: helm.sh/chart
          operator: In
          values:
          - kubenab
      topologyKey: "kubernetes.io/hostname"

any of these labels can be used for podAntiAffinity

Emanuel Bennici · Answer 17 · Fri Oct 11 2019 17:43:57 GMT+0800 (China Standard Time)

@rimusz oh sorry I haven't seen that