aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.

Home Page:https://karpenter.sh

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nodes are created but are not being registered to the cluster

nalshamaajc opened this issue · comments

Description

Observed Behavior: Nodes are created but are not being registered to the cluster

Expected Behavior: Nodes are created and are registered to the cluster

Reproduction Steps (Please include YAML):
Try upgrade from v0.31.5 to v0.32.0 and follow the upgrade guide I skipped some steps because I use a helm chart.

I was also using the role parameter and not the instanceProfile in the ec2NodeClass.

The nodepool creates a nodeClaim but does not create the node.

I got the below in the nodeClaim status section.

status:
  allocatable:
    cpu: 7910m
    ephemeral-storage: 107Gi
    memory: 14162Mi
    pods: '58'
    vpc.amazonaws.com/pod-eni: '38'
  capacity:
    cpu: '8'
    ephemeral-storage: 120Gi
    memory: 15155Mi
    pods: '58'
    vpc.amazonaws.com/pod-eni: '38'
  conditions:
    - lastTransitionTime: '2024-05-30T13:31:00Z'
      message: Node not registered with cluster
      reason: NodeNotFound
      status: 'False'
      type: Initialized
    - lastTransitionTime: '2024-05-30T13:31:00Z'
      status: 'True'
      type: Launched
    - lastTransitionTime: '2024-05-30T13:31:00Z'
      message: Node not registered with cluster
      reason: NodeNotFound
      status: 'False'
      type: Ready
    - lastTransitionTime: '2024-05-30T13:31:00Z'
      message: Node not registered with cluster
      reason: NodeNotFound
      status: 'False'
      type: Registered
  imageID: ami-0fe93db2c62573b1a
  providerID: 'aws:///us-west-2c/i-0f84d19de2821fd6c'

In the logs I noticed the below entry

"level":"DEBUG","time":"2024-05-30T13:46:01.646Z","logger":"controller.nodeclaim.lifecycle","message":"terminating due to registration ttl","commit":"f0eb822","nodeclaim":"default-hbgnd","nodepool":"default","ttl":"15m0s"}

The docs state that when using a role it should be able to create/manage an instance profile, but it seems that it doesn't add the role/service account mapping to the aws-auth configmap when using IRSA.

Migration guide

For each EC2NodeClass, specify the $KARPENTER_NODE_ROLE you will use for nodes launched with this node class. Karpenter v1beta1 drops the need for managing your own instance profile and uses node roles directly. The example below shows how to migrate your AWSNodeTemplate to an EC2NodeClass if your node role is the same role that was used when creating your cluster with the Getting Started Guide.

changelog

Karpenter will now auto-generate the instance profile in your EC2NodeClass, given the role that you specify.

Solution

The problem was solved when I used the instanceProfile instead of role for the ec2NodeClass, the instanceProfile was already configured in the aws-auth configMap which made it work.

Versions: v0.32.10

  • Chart Version: v0.32.10
  • Kubernetes Version (kubectl version): EKS v1.29
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Can you share the role that was being added to the EC2NodeClass?

@engedaam I assume you're interested in the policy attached to this role.

{
    "Statement": [
        {
            "Action": [
                "ec2:RunInstances",
                "ec2:CreateFleet"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ec2:REGION::snapshot/*",
                "arn:aws:ec2:REGION::image/*",
                "arn:aws:ec2:REGION:*:subnet/*",
                "arn:aws:ec2:REGION:*:spot-instances-request/*",
                "arn:aws:ec2:REGION:*:security-group/*",
                "arn:aws:ec2:REGION:*:launch-template/*"
            ],
            "Sid": "AllowScopedEC2InstanceActions"
        },
        {
            "Action": [
                "ec2:RunInstances",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/kubernetes.io/cluster/CLUSTER": "owned"
                },
                "StringLike": {
                    "aws:RequestTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ec2:REGION:*:volume/*",
                "arn:aws:ec2:REGION:*:network-interface/*",
                "arn:aws:ec2:REGION:*:launch-template/*",
                "arn:aws:ec2:REGION:*:instance/*",
                "arn:aws:ec2:REGION:*:fleet/*"
            ],
            "Sid": "AllowScopedEC2InstanceActionsWithTags"
        },
        {
            "Action": "ec2:CreateTags",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/kubernetes.io/cluster/CLUSTER": "owned",
                    "ec2:CreateAction": [
                        "RunInstances",
                        "CreateFleet",
                        "CreateLaunchTemplate"
                    ]
                },
                "StringLike": {
                    "aws:RequestTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ec2:REGION:*:volume/*",
                "arn:aws:ec2:REGION:*:network-interface/*",
                "arn:aws:ec2:REGION:*:launch-template/*",
                "arn:aws:ec2:REGION:*:instance/*",
                "arn:aws:ec2:REGION:*:fleet/*"
            ],
            "Sid": "AllowScopedResourceCreationTagging"
        },
        {
            "Action": "ec2:CreateTags",
            "Condition": {
                "ForAllValues:StringEquals": {
                    "aws:TagKeys": [
                        "karpenter.sh/nodeclaim",
                        "Name"
                    ]
                },
                "StringEquals": {
                    "aws:ResourceTag/kubernetes.io/cluster/CLUSTER": "owned"
                },
                "StringLike": {
                    "aws:ResourceTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "arn:aws:ec2:REGION:*:instance/*",
            "Sid": "AllowScopedResourceTagging"
        },
        {
            "Action": [
                "ec2:TerminateInstances",
                "ec2:DeleteLaunchTemplate"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/kubernetes.io/cluster/CLUSTER": "owned"
                },
                "StringLike": {
                    "aws:ResourceTag/karpenter.sh/nodepool": "*"
                }
            },
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ec2:REGION:*:launch-template/*",
                "arn:aws:ec2:REGION:*:instance/*"
            ],
            "Sid": "AllowScopedDeletion"
        },
        {
            "Action": [
                "ec2:DescribeSubnets",
                "ec2:DescribeSpotPriceHistory",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeImages",
                "ec2:DescribeAvailabilityZones"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "REGION"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AllowRegionalReadActions"
        },
        {
            "Action": "ssm:GetParameter",
            "Effect": "Allow",
            "Resource": "arn:aws:ssm:REGION::parameter/aws/service/*",
            "Sid": "AllowSSMReadActions"
        },
        {
            "Action": "pricing:GetProducts",
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AllowPricingReadActions"
        },
        {
            "Action": [
                "sqs:ReceiveMessage",
                "sqs:GetQueueUrl",
                "sqs:GetQueueAttributes",
                "sqs:DeleteMessage"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:sqs:REGION:xxxxxxxxxxxxxx:CLUSTER",
            "Sid": "AllowInterruptionQueueActions"
        },
        {
            "Action": "iam:PassRole",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "ec2.amazonaws.com"
                }
            },
            "Effect": "Allow",
            "Resource": "arn:aws:iam::xxxxxxxxxxxxxx:role/CLUSTER",
            "Sid": "AllowPassingInstanceRole"
        },
        {
            "Action": "iam:CreateInstanceProfile",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/kubernetes.io/cluster/CLUSTER": "owned",
                    "aws:RequestTag/topology.kubernetes.io/region": "REGION"
                },
                "StringLike": {
                    "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AllowScopedInstanceProfileCreationActions"
        },
        {
            "Action": "iam:TagInstanceProfile",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/kubernetes.io/cluster/CLUSTER": "owned",
                    "aws:RequestTag/topology.kubernetes.io/region": "REGION",
                    "aws:ResourceTag/kubernetes.io/cluster/CLUSTER": "owned",
                    "aws:ResourceTag/topology.kubernetes.io/region": "REGION"
                },
                "StringLike": {
                    "aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*",
                    "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AllowScopedInstanceProfileTagActions"
        },
        {
            "Action": [
                "iam:RemoveRoleFromInstanceProfile",
                "iam:DeleteInstanceProfile",
                "iam:AddRoleToInstanceProfile"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/kubernetes.io/cluster/CLUSTER": "owned",
                    "aws:ResourceTag/topology.kubernetes.io/region": "REGION"
                },
                "StringLike": {
                    "aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AllowScopedInstanceProfileActions"
        },
        {
            "Action": "iam:GetInstanceProfile",
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "AllowInstanceProfileReadActions"
        },
        {
            "Action": "eks:DescribeCluster",
            "Effect": "Allow",
            "Resource": "arn:aws:eks:REGION:xxxxxxxxxxxxxx:cluster/CLUSTER",
            "Sid": "AllowAPIServerEndpointDiscovery"
        }
    ],
    "Version": "2012-10-17"
}

I'm interested in the KARPENTER_NODE_ROLE here. This would be the role that was specified in the EC2NodeClass. Also have followed looked at the troubleshooting for guidance on this issue? https://karpenter.sh/docs/troubleshooting/#node-notready

@engedaam I cannot expose the role details here, are there specific details that you are interested in? also nodes wasn't showing in the node list but checking it under the ec2 instance tab showed that it is created and ready.

I understand, I was mainly looking to make sure the node role contains permission to join the cluster. By default, here is the node role the team recommends: https://karpenter.sh/v0.32/reference/cloudformation/#node-authorization. Also the troubleshooting guide give steps on how to understand why a node might not joining the cluster and steps to fix the issue.

@nalshamaajc Any progress in getting nodes to join?

is there a way to automattically patch the aws-auth cm to add the KarpenterNode role?

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

I’m also trying to find something for that but I got nothing. Should I use the aws-auth module with output values from the karpenter module? Or should I use the iam role for irsa module ? Or both, with a specific argument configuration?

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.