vmware-tanzu / sonobuoy

Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests and other plugins in an accessible and non-destructive manner.

Home Page:https://sonobuoy.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kubernetes 1.28.6 E2E tests are not running due to unschedulable master node

mehulgogri opened this issue · comments

What steps did you take and what happened:
./sonobuoy delete --all --wait
./sonobuoy run --mode=certified-conformance --wait

time="2024-01-24T01:49:28Z" level=info msg="delete request issued" dry-run=false kind=clusterrolebindings names="[]"
time="2024-01-24T01:49:28Z" level=info msg="delete request issued" dry-run=false kind=clusterroles names="[]"
Namespace "sonobuoy" has been deleted
Deleted all ClusterRoles and ClusterRoleBindings.
All E2E namespaces deleted
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy namespace= resource=namespaces
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy-serviceaccount namespace=sonobuoy resource=serviceaccounts
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterrolebindings
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterroles
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy-config-cm namespace=sonobuoy resource=configmaps
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy-plugins-cm namespace=sonobuoy resource=configmaps
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy namespace=sonobuoy resource=pods
time="2024-01-24T01:49:35Z" level=info msg="create request issued" name=sonobuoy-aggregator namespace=sonobuoy resource=services
01:49:55 Waiting for the aggregator status to become Running. Currently the status is Status: Pending, Reason: ContainersNotReady, containers with unready status: [kube-sonobuoy]
01:49:55 Details of containers that are not ready:
01:49:55 kube-sonobuoy: waiting: ContainerCreating
01:50:35          PLUGIN                                          NODE    STATUS   RESULT   PROGRESS
01:50:35             e2e                                        global   running                    
01:50:35    systemd-logs   m2-lr1-dev-vm209096.mip.storage.hpecorp.net   running                    
01:50:35    systemd-logs   m2-lr1-dev-vm209097.mip.storage.hpecorp.net   running                    
01:50:35    systemd-logs   m2-lr1-dev-vm209099.mip.storage.hpecorp.net   running                    
01:50:35 Sonobuoy is still running. Runs can take 60 minutes or more depending on cluster and plugin configuration.
01:51:15    systemd-logs   m2-lr1-dev-vm209096.mip.storage.hpecorp.net   complete                    
01:51:15    systemd-logs   m2-lr1-dev-vm209097.mip.storage.hpecorp.net   complete                    
01:51:15    systemd-logs   m2-lr1-dev-vm209099.mip.storage.hpecorp.net   complete                    
02:21:16             e2e                                        global   complete                    
02:21:16 Sonobuoy plugins have completed. Preparing results for download.
02:21:35             e2e                                        global   complete   failed           
02:21:35    systemd-logs   m2-lr1-dev-vm209096.mip.storage.hpecorp.net   complete   passed           
02:21:35    systemd-logs   m2-lr1-dev-vm209097.mip.storage.hpecorp.net   complete   passed           
02:21:35    systemd-logs   m2-lr1-dev-vm209099.mip.storage.hpecorp.net   complete   passed           
02:21:35 Sonobuoy has completed. Use `sonobuoy retrieve` to get results.
            e2e   complete   failed       1           
   systemd-logs   complete   passed       3           
Sonobuoy has completed. Use `sonobuoy retrieve` to get results.
Plugin: e2e
Status: failed
Total: 4
Passed: 3
Failed: 1
Skipped: 0
Failed tests:
Plugin: systemd-logs
Status: passed
Total: 3
Passed: 3
Failed: 0
Skipped: 0
Run Details:
API Server version: v1.28.6-hpe1
Node health: 3/3 (100%)
Pods health: 12/13 (92%)
Details for failed pods:
sonobuoy/sonobuoy-e2e-job-8b54e25fd7344f4a Ready:False: PodFailed: 
Errors detected in files:
1200 podlogs/kube-system/calico-worker-v8h7b/logs/calico-node.txt
  66 podlogs/kube-system/calico-typha-5769bc9c7-lnk5k/logs/calico-typha.txt
  41 podlogs/kube-system/calico-worker-97pgj/logs/calico-node.txt
  32 podlogs/kube-system/calico-master-stghw/logs/calico-node.txt
  26 podlogs/kube-system/calico-kube-controllers-7bd799d976-x9fw7/logs/calico-kube-controllers.txt
   6 podlogs/sonobuoy/sonobuoy-e2e-job-8b54e25fd7344f4a/logs/e2e.txt
   3 podlogs/kube-system/coredns-6c45976c97-tblqq/logs/coredns.txt
   3 podlogs/kube-system/coredns-6c45976c97-vxpqf/logs/coredns.txt
401 podlogs/kube-system/calico-worker-v8h7b/logs/calico-node.txt
 10 podlogs/kube-system/calico-master-stghw/logs/calico-node.txt
 10 podlogs/kube-system/calico-worker-97pgj/logs/calico-node.txt
  1 podlogs/kube-system/calico-kube-controllers-7bd799d976-x9fw7/logs/calico-kube-controllers.txt
  1 podlogs/sonobuoy/sonobuoy-e2e-job-8b54e25fd7344f4a/logs/e2e.txt
  1 podlogs/sonobuoy/sonobuoy/logs/kube-sonobuoy.txt
time="2024-01-24T02:21:41Z" level=info msg="delete request issued" dry-run=false kind=namespace namespace=sonobuoy
time="2024-01-24T02:21:41Z" level=info msg="delete request issued" dry-run=false kind=clusterrolebindings names="[sonobuoy-serviceaccount-sonobuoy]"
time="2024-01-24T02:21:42Z" level=info msg="delete request issued" dry-run=false kind=clusterroles names="[sonobuoy-serviceaccount-sonobuoy]"
Namespace "sonobuoy" has status {Phase:Terminating Conditions:[{Type:NamespaceDeletionDiscoveryFailure Status:False LastTransitionTime:2024-01-24 02:21:25 +0000 UTC Reason:ResourcesDiscovered Message:All resources successfully discovered} {Type:NamespaceDeletionGroupVersionParsingFailure Status:False LastTransitionTime:2024-01-24 02:21:25 +0000 UTC Reason:ParsedGroupVersions Message:All legacy kube types successfully parsed} {Type:NamespaceDeletionContentFailure Status:False LastTransitionTime:2024-01-24 02:21:25 +0000 UTC Reason:ContentDeleted Message:All content successfully deleted, may be waiting on finalization} {Type:NamespaceContentRemaining Status:True LastTransitionTime:2024-01-24 02:21:25 +0000 UTC Reason:SomeResourcesRemain Message:Some resources are remaining: pods. has 4 resource instances} {Type:NamespaceFinalizersRemaining Status:False LastTransitionTime:2024-01-24 02:21:25 +0000 UTC Reason:ContentHasNoFinalizers Message:All content-preserving finalizers finished}]}
Namespace "sonobuoy" has been deleted
Deleted all ClusterRoles and ClusterRoleBindings.
All E2E namespaces deleted
Done conformance-test
I see below error in e2e.txt file
Jan 24 02:58:04.888: INFO: >>> kubeConfig: /tmp/kubeconfig-1910277190
Jan 24 02:58:04.890: INFO: Waiting up to 30m0s for all (but 0) nodes to be schedulable
  E0124 02:58:04.897527      23 progress.go:96] Failed to post progress update to http://localhost:8099/progress: Post "http://localhost:8099/progress": dial tcp connect: connection refused
  E0124 02:58:04.897635      23 progress.go:96] Failed to post progress update to http://localhost:8099/progress: Post "http://localhost:8099/progress": dial tcp connect: connection refused
Jan 24 02:58:04.901: INFO: Unschedulable nodes= 1, maximum value for starting tests= 0
Jan 24 02:58:04.901: INFO: 	-> Node m2-lr1-dev-vm209096.mip.storage.hpecorp.net [[[ Ready=true, Network(available)=true, Taints=[{node-role.kubernetes.io/master  NoSchedule <nil>}], NonblockingTaints=node-role.kubernetes.io/control-plane ]]]
Jan 24 02:58:04.901: INFO: ==== node wait: 2 out of 3 nodes are ready, max notReady allowed 0.  Need 1 more before starting.

What did you expect to happen:
I expected the e2e tests to run

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]


  • Sonobuoy version: 0.57.1
  • Kubernetes version: (use kubectl version): 1.28.6
  • Kubernetes installer & version: 1.28.6
  • Cloud provider or hardware configuration: VMware VM
  • OS (e.g. from /etc/os-release): SLES15SP3
  • Sonobuoy tarball (which contains * below): sonobuoy_0.57.1_linux_amd64.tar.gz
  • Cluster - one master and two worker nodes

Attached e2e.txt file: e2e.txt

I ran the sonobuoy command by passing --non-blocking-taints e2e extra args, but it did not work.

./sonobuoy run --mode=certified-conformance --wait --plugin-env e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/master:NoSchedule"

Could someone please share an example of how to run the sonobuoy run command with --non-block-taints?

I was able to run the e2e test by running sonobuoy command as below
./sonobuoy run --mode=certified-conformance --wait --plugin-env e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/master"