Workloads not working in k8s

Question

Workloads not working in k8s

ccamacho opened this issue 3 years ago · comments

Carlos Camacho Gonzalez commented 3 years ago

Describe the bug
Workload Pods are not running

To Reproduce
Steps to reproduce the behavior:

Deploy k8s
Run a sample app
The workloads are not allocated

Expected behavior
Workloads running

Additional context
1- From the controller run a kubectl get nodes, then check that in the case there are no workers, we have this issue

Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Warning  FailedScheduling        13m                   default-scheduler  0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

We need to run the following command because there are no workers in this CI cluster:

kubectl taint node controller-01.k8scluster.kubeinit.local node-role.kubernetes.io/master:NoSchedule-
kubectl taint node controller-02.k8scluster.kubeinit.local node-role.kubernetes.io/master:NoSchedule-
kubectl taint node controller-03.k8scluster.kubeinit.local node-role.kubernetes.io/master:NoSchedule-

2- After deploying a simple app

[root@controller-01 ~]# kubectl get pods -l app=nginx
NAME                                READY   STATUS              RESTARTS   AGE
nginx-deployment-66b6c48dd5-2g7cj   0/1     ContainerCreating   0          13m
nginx-deployment-66b6c48dd5-46ntm   0/1     ContainerCreating   0          13m
nginx-deployment-66b6c48dd5-47k9r   0/1     ContainerCreating   0          13m

Pods hangs because of:

Warning  FailedCreatePodSandBox  55s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_nginx-deployment-66b6c48dd5-2g7cj_default_4b860f60-a0ce-4a8a-a160-6651f7416f8c_0(908b72da2ea2e45cbcd48fa082220b65ed92284ebcacd8f79c420f5a2135cebb): error adding pod default_nginx-deployment-66b6c48dd5-2g7cj to CNI network "cbr0": failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.0.1/24

The cluster has after running kubectl edit nodes controller-01.k8scluster.kubeinit.local:

  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: controller-01.k8scluster.kubeinit.local
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
    node-role.kubernetes.io/master: ""
    node.kubernetes.io/exclude-from-external-load-balancers: ""
  name: controller-01.k8scluster.kubeinit.local
  resourceVersion: "6786"
  uid: b6933b3c-1f7f-453a-9793-61aff36efdbc
spec:
  podCIDR: 10.244.0.0/24
  podCIDRs:
  - 10.244.0.0/24
status:
  addresses:
  - address: 10.0.0.1
    type: InternalIP
  - address: controller-01.k8scluster.kubeinit.local
    type: Hostname
  allocatable:

In which 10.244.0.0/24 do not match with https://github.com/Kubeinit/kubeinit/blob/main/kubeinit/roles/kubeinit_k8s/defaults/main.yml#L26

3- In the first controller cni0 should match iirc 10.244.0.0/16

ip a
3: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d2:2c:da:fe:4b:77 brd ff:ff:ff:ff:ff:ff
    inet 10.85.0.1/16 brd 10.85.255.255 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 1100:200::1/24 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::d02c:daff:fefe:4b77/64 scope link 
       valid_lft forever preferred_lft forever

Carlos Camacho Gonzalez · Answer 1 · Thu Nov 04 2021 16:54:20 GMT+0800 (China Standard Time)

We should create a small role to deploy a simple app and make sure the workloads are able to run.

Carlos Camacho Gonzalez · Answer 2 · Thu Nov 04 2021 20:49:32 GMT+0800 (China Standard Time)

Reproduced in #548 as per: https://storage.googleapis.com/kubeinit-ci/jobs/k8s-libvirt-3-0-2-h-pr-401904803-2021.11.04.08.10.55-1/results/853.html#stdout

Carlos Camacho Gonzalez · Answer 3 · Fri Nov 05 2021 22:29:52 GMT+0800 (China Standard Time)

Fixed by: #548