The MachinePool remains in the "ScalingUp" phase instead of transitioning to "Running"

Question

The MachinePool remains in the "ScalingUp" phase instead of transitioning to "Running"

nitinthe0072000 opened this issue 2 months ago · comments

nitinthe0072000 commented 2 months ago

What steps did you take and what happened?

We attempted to create a Kubernetes cluster on AWS using kubeadm as the bootstrap provider and AWS CAPA as the infrastructure provider. The Control Plane deployed successfully on EC2. However, we encountered an issue with the MachinePool (Worker Nodes) where it remains in the "ScalingUp" phase instead of transitioning to "Running" even after the node joins the control plane and is in the "Ready" state."

Also there is one more issue that the worker node attached with control plane though comes to ready state but the node have the following taint attached always node.cluster.x-k8s.io/uninitialized:NoSchedule

NAME                                 DATA   AGE
configmap/aws-vpc-cni-driver-addon   1      29m

NAME                                       CLUSTERCLASS   PHASE         AGE   VERSION
cluster.cluster.x-k8s.io/kubeadm-cluster                  Provisioned   29m   

NAME                                                         CLUSTER           READY   VPC                     BASTION IP
awscluster.infrastructure.cluster.x-k8s.io/kubeadm-cluster   kubeadm-cluster   true    vpc-01f450e7d16ae68c1   

NAME                                                                              CLUSTER           INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/kubeadm-cluster-control-plane   kubeadm-cluster   true          true                   1          1       1         0             29m   v1.28.7

NAME                                                                               AGE
awsmachinetemplate.infrastructure.cluster.x-k8s.io/kubeadm-cluster-control-plane   29m

NAME                                                CLUSTER           REPLICAS   PHASE       AGE   VERSION
machinepool.cluster.x-k8s.io/kubeadm-cluster-mp-0   kubeadm-cluster   1          ScalingUp   29m   v1.28.7

NAME                                                                  READY   REPLICAS   MINSIZE   MAXSIZE   LAUNCHTEMPLATE ID
awsmachinepool.infrastructure.cluster.x-k8s.io/kubeadm-cluster-mp-0   true    1          1         10        lt-0228e7376fb7fd6d5

NAME                                                            CLUSTER           AGE
kubeadmconfig.bootstrap.cluster.x-k8s.io/kubeadm-cluster-mp-0   kubeadm-cluster   29m

NAME                                                 AGE
clusterresourceset.addons.cluster.x-k8s.io/crs-cni   29m`

From the logs of CAPI Controller we got the following:

I0430 09:52:23.610332       1 machinepool_controller_noderef.go:168] "No ProviderID detected, skipping" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster" providerIDList=1 providerID=""
I0430 09:52:23.610393       1 machinepool_controller_noderef.go:168] "No ProviderID detected, skipping" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster" providerIDList=1 providerID=""
I0430 09:52:23.610414       1 machinepool_controller_noderef.go:87] "Cannot assign NodeRefs to MachinePool, no matching Nodes" controller="machinepool" controllerGroup="cluster.x-k8s.io" controllerKind="MachinePool" MachinePool="default/kubeadm-cluster-mp-0" namespace="default" name="kubeadm-cluster-mp-0" reconcileID="cff1204c-14f7-494b-b75b-4350252102f0" Cluster="default/kubeadm-cluster"

The describe of MachinePool:

Name:         kubeadm-cluster-mp-0
Namespace:    default
Labels:       cluster.x-k8s.io/cluster-name=kubeadm-cluster
              nodepool=nodepool-0
Annotations:  <none>
API Version:  cluster.x-k8s.io/v1beta1
Kind:         MachinePool
Metadata:
  Creation Timestamp:  2024-04-30T09:44:52Z
  Finalizers:
    machinepool.cluster.x-k8s.io
  Generation:  3
  Owner References:
    API Version:     cluster.x-k8s.io/v1beta1
    Kind:            Cluster
    Name:            kubeadm-cluster
    UID:             12d5fe0d-55c4-454e-ae17-1f4c790ffdd1
  Resource Version:  1398339
  UID:               b7662d30-c459-4a7c-a859-a8b96e0c3bfa
Spec:
  Cluster Name:       kubeadm-cluster
  Min Ready Seconds:  0
  Provider ID List:
    aws:///eu-west-3a/i-03fb955e66b456xxx
  Replicas:  1
  Template:
    Metadata:
      Labels:
        Nodepool:  nodepool-0
    Spec:
      Bootstrap:
        Config Ref:
          API Version:     bootstrap.cluster.x-k8s.io/v1beta1
          Kind:            KubeadmConfig
          Name:            kubeadm-cluster-mp-0
          Namespace:       default
        Data Secret Name:  kubeadm-cluster-mp-0
      Cluster Name:        kubeadm-cluster
      Infrastructure Ref:
        API Version:  infrastructure.cluster.x-k8s.io/v1beta2
        Kind:         AWSMachinePool
        Name:         kubeadm-cluster-mp-0
        Namespace:    default
      Version:        v1.28.7
Status:
  Bootstrap Ready:  true
  Conditions:
    Last Transition Time:  2024-04-30T09:50:48Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2024-04-30T09:50:46Z
    Status:                True
    Type:                  BootstrapReady
    Last Transition Time:  2024-04-30T09:50:48Z
    Status:                True
    Type:                  InfrastructureReady
    Last Transition Time:  2024-04-30T09:44:52Z
    Status:                True
    Type:                  ReplicasReady
  Infrastructure Ready:    true
  Observed Generation:     3
  Phase:                   ScalingUp
  Replicas:                1
  Unavailable Replicas:    1
Events:                    <none>

CAPI File:

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: kubeadm-cluster
  labels:
    cni: external
spec:
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    name: kubeadm-cluster
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: kubeadm-cluster-control-plane
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
  name: kubeadm-cluster
spec:
  region: eu-west-3
  sshKeyName: capi-server
---
kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
metadata:
  name: kubeadm-cluster-control-plane
spec:
  replicas: 1
  version: 1.28.7
  machineTemplate:
    infrastructureRef:
      kind: AWSMachineTemplate
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      name: kubeadm-cluster-control-plane
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data.local_hostname }}'
    clusterConfiguration:
      apiServer:
        extraArgs:
          authorization-mode: Node,RBAC
      etcd:
        local:
          dataDir: /var/lib/etcd
      kubernetesVersion: 1.28.7
      networking:
        dnsDomain: cluster.local
        serviceSubnet: 10.96.0.0/12
    joinConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data.local_hostname }}'
---
kind: AWSMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
metadata:
  name: kubeadm-cluster-control-plane
spec:
  template:
    spec:
      instanceType: t3a.large
      iamInstanceProfile: "control-plane.cluster-api-provider-aws.sigs.k8s.io"
      sshKeyName: capi-server
      ami:
       id: ami-05a64c4151d99b765
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
  name: kubeadm-cluster-mp-0
  namespace: default
  labels:
    nodepool: nodepool-0  
spec:
  clusterName: kubeadm-cluster
  replicas: 1
  template:
    metadata:
      labels:
        nodepool: nodepool-0
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfig
          name: kubeadm-cluster-mp-0
      clusterName: kubeadm-cluster
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        kind: AWSMachinePool
        name: kubeadm-cluster-mp-0
      version: 1.28.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachinePool
metadata:
  labels:
    nodepool: nodepool-0
  name: kubeadm-cluster-mp-0
  namespace: default
spec:
  minSize: 1
  maxSize: 10
  availabilityZones:
    - eu-west-3a
  awsLaunchTemplate:
    iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
    instanceType: t3a.large
    sshKeyName: capi-server
    ami:
      id: ami-05a64c4151d99b765
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
metadata:
  name: kubeadm-cluster-mp-0
  namespace: default
spec:
  joinConfiguration:
    nodeRegistration:
      name: '{{ ds.meta_data.local_hostname }}'
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
  name: crs-cni
spec:
  clusterSelector:
    matchLabels:
      cni: external
  resources:
  - kind: ConfigMap
    name: aws-vpc-cni-driver-addon
  strategy: ApplyOnce

What did you expect to happen?

MachinePool Phase should be in running state once the worker node becomes ready. Also the following taint node.cluster.x-k8s.io/uninitialized:NoSchedule should be removed from the worker node.

Cluster API version

CAPI Version : 1.7.1
CAPA Version : 2.4.2

Kubernetes version

Kubernetes : v1.28.7

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

Matt Boersma commented 2 months ago

/assign

Kubernetes Prow Robot · Answer 1 · Tue Apr 30 2024 18:45:12 GMT+0800 (China Standard Time)

This issue is currently awaiting triage.

CAPI contributors will take a look as soon as possible, apply one of the triage/* labels and provide further guidance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Fabrizio Pandini · Answer 2 · Tue Apr 30 2024 20:42:08 GMT+0800 (China Standard Time)

cc @willie-yao @Jont828 @mboersma to take a first look and assign priority

Matt Boersma · Answer 3 · Thu May 02 2024 22:44:34 GMT+0800 (China Standard Time)

/priority important-soon

This could be a bug in CAPI MachinePools, but we need to verify that it's not specific to AWS and try to find a fix either way.