The MachinePool remains in the "ScalingUp" phase instead of transitioning to "Running"
nitinthe0072000 opened this issue · comments
What steps did you take and what happened?
We attempted to create a Kubernetes cluster on AWS using kubeadm as the bootstrap provider and AWS CAPA as the infrastructure provider. The Control Plane deployed successfully on EC2. However, we encountered an issue with the MachinePool (Worker Nodes) where it remains in the "ScalingUp" phase instead of transitioning to "Running" even after the node joins the control plane and is in the "Ready" state."
Also there is one more issue that the worker node attached with control plane though comes to ready state but the node have the following taint attached always node.cluster.x-k8s.io/uninitialized:NoSchedule
NAME DATA AGE
configmap/aws-vpc-cni-driver-addon 1 29m
NAME CLUSTERCLASS PHASE AGE VERSION
cluster.cluster.x-k8s.io/kubeadm-cluster Provisioned 29m
NAME CLUSTER READY VPC BASTION IP
awscluster.infrastructure.cluster.x-k8s.io/kubeadm-cluster kubeadm-cluster true vpc-01f450e7d16ae68c1
NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/kubeadm-cluster-control-plane kubeadm-cluster true true 1 1 1 0 29m v1.28.7
NAME AGE
awsmachinetemplate.infrastructure.cluster.x-k8s.io/kubeadm-cluster-control-plane 29m
NAME CLUSTER REPLICAS PHASE AGE VERSION
machinepool.cluster.x-k8s.io/kubeadm-cluster-mp-0 kubeadm-cluster 1 ScalingUp 29m v1.28.7
NAME READY REPLICAS MINSIZE MAXSIZE LAUNCHTEMPLATE ID
awsmachinepool.infrastructure.cluster.x-k8s.io/kubeadm-cluster-mp-0 true 1 1 10 lt-0228e7376fb7fd6d5
NAME CLUSTER AGE
kubeadmconfig.bootstrap.cluster.x-k8s.io/kubeadm-cluster-mp-0 kubeadm-cluster 29m
NAME AGE
clusterresourceset.addons.cluster.x-k8s.io/crs-cni 29m`
From the logs of CAPI Controller we got the following:
I0430 09:52:23.610332 1 machinepool_controller_noderef.go:168] "No ProviderID detected, skipping" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster" providerIDList=1 providerID=""
I0430 09:52:23.610393 1 machinepool_controller_noderef.go:168] "No ProviderID detected, skipping" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster" providerIDList=1 providerID=""
I0430 09:52:23.610414 1 machinepool_controller_noderef.go:87] "Cannot assign NodeRefs to MachinePool, no matching Nodes" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster"
The describe of MachinePool:
Name: kubeadm-cluster-mp-0
Namespace: default
Labels: cluster.x-k8s.io/cluster-name=kubeadm-cluster
nodepool=nodepool-0
Annotations: <none>
API Version: cluster.x-k8s.io/v1beta1
Kind: MachinePool
Metadata:
Creation Timestamp: 2024-04-30T09:44:52Z
Finalizers:
machinepool.cluster.x-k8s.io
Generation: 3
Owner References:
API Version: cluster.x-k8s.io/v1beta1
Kind: Cluster
Name: kubeadm-cluster
UID: 12d5fe0d-55c4-454e-ae17-1f4c790ffdd1
Resource Version: 1398339
UID: b7662d30-c459-4a7c-a859-a8b96e0c3bfa
Spec:
Cluster Name: kubeadm-cluster
Min Ready Seconds: 0
Provider ID List:
aws:///eu-west-3a/i-03fb955e66b456xxx
Replicas: 1
Template:
Metadata:
Labels:
Nodepool: nodepool-0
Spec:
Bootstrap:
Config Ref:
API Version: bootstrap.cluster.x-k8s.io/v1beta1
Kind: KubeadmConfig
Name: kubeadm-cluster-mp-0
Namespace: default
Data Secret Name: kubeadm-cluster-mp-0
Cluster Name: kubeadm-cluster
Infrastructure Ref:
API Version: infrastructure.cluster.x-k8s.io/v1beta2
Kind: AWSMachinePool
Name: kubeadm-cluster-mp-0
Namespace: default
Version: v1.28.7
Status:
Bootstrap Ready: true
Conditions:
Last Transition Time: 2024-04-30T09:50:48Z
Status: True
Type: Ready
Last Transition Time: 2024-04-30T09:50:46Z
Status: True
Type: BootstrapReady
Last Transition Time: 2024-04-30T09:50:48Z
Status: True
Type: InfrastructureReady
Last Transition Time: 2024-04-30T09:44:52Z
Status: True
Type: ReplicasReady
Infrastructure Ready: true
Observed Generation: 3
Phase: ScalingUp
Replicas: 1
Unavailable Replicas: 1
Events: <none>
CAPI File:
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: kubeadm-cluster
labels:
cni: external
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
name: kubeadm-cluster
controlPlaneRef:
kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
name: kubeadm-cluster-control-plane
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
name: kubeadm-cluster
spec:
region: eu-west-3
sshKeyName: capi-server
---
kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
metadata:
name: kubeadm-cluster-control-plane
spec:
replicas: 1
version: 1.28.7
machineTemplate:
infrastructureRef:
kind: AWSMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
name: kubeadm-cluster-control-plane
kubeadmConfigSpec:
initConfiguration:
nodeRegistration:
name: '{{ ds.meta_data.local_hostname }}'
clusterConfiguration:
apiServer:
extraArgs:
authorization-mode: Node,RBAC
etcd:
local:
dataDir: /var/lib/etcd
kubernetesVersion: 1.28.7
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
joinConfiguration:
nodeRegistration:
name: '{{ ds.meta_data.local_hostname }}'
---
kind: AWSMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
metadata:
name: kubeadm-cluster-control-plane
spec:
template:
spec:
instanceType: t3a.large
iamInstanceProfile: "control-plane.cluster-api-provider-aws.sigs.k8s.io"
sshKeyName: capi-server
ami:
id: ami-05a64c4151d99b765
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
name: kubeadm-cluster-mp-0
namespace: default
labels:
nodepool: nodepool-0
spec:
clusterName: kubeadm-cluster
replicas: 1
template:
metadata:
labels:
nodepool: nodepool-0
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
name: kubeadm-cluster-mp-0
clusterName: kubeadm-cluster
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachinePool
name: kubeadm-cluster-mp-0
version: 1.28.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachinePool
metadata:
labels:
nodepool: nodepool-0
name: kubeadm-cluster-mp-0
namespace: default
spec:
minSize: 1
maxSize: 10
availabilityZones:
- eu-west-3a
awsLaunchTemplate:
iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
instanceType: t3a.large
sshKeyName: capi-server
ami:
id: ami-05a64c4151d99b765
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
metadata:
name: kubeadm-cluster-mp-0
namespace: default
spec:
joinConfiguration:
nodeRegistration:
name: '{{ ds.meta_data.local_hostname }}'
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
name: crs-cni
spec:
clusterSelector:
matchLabels:
cni: external
resources:
- kind: ConfigMap
name: aws-vpc-cni-driver-addon
strategy: ApplyOnce
What did you expect to happen?
MachinePool Phase should be in running state once the worker node becomes ready. Also the following taint node.cluster.x-k8s.io/uninitialized:NoSchedule
should be removed from the worker node.
Cluster API version
CAPI Version : 1.7.1
CAPA Version : 2.4.2
Kubernetes version
Kubernetes : v1.28.7
Anything else you would like to add?
No response
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
This issue is currently awaiting triage.
CAPI contributors will take a look as soon as possible, apply one of the triage/*
labels and provide further guidance.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
cc @willie-yao @Jont828 @mboersma to take a first look and assign priority
/priority important-soon
This could be a bug in CAPI MachinePools, but we need to verify that it's not specific to AWS and try to find a fix either way.
/assign