aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.

Home Page:https://github.com/aws/aws-parallelcluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cdk error when creating cluster on version 3.8.0

gallanik opened this issue · comments

Required Info:

  • AWS ParallelCluster version [e.g. 3.1.1]: 3.8.0
  • Full cluster configuration without any credentials or personal data - pcluster-config-dev.yaml.txt
  • Cluster name: cluster380

Bug description
Parallel cluster version 3.8.0 introduces a parameter (Monitoring/Alarms/Enabled) to toggle Amazon CloudWatch Alarms for the cluster. When I try to create a cluster with Alarms/Enabled as False/True, it throws the below error:

(pctest) sh-4.2$ pcluster create-cluster -n cluster380 -c config380.yaml -r us-east-1
{
  "message": "'ClusterCdkStack' object has no attribute 'head_node_alarms'"
}

The dry run succeeds though:

(pctest) sh-4.2$ pcluster create-cluster -n cluster380 -c config380.yaml -r us-east-1 --dryrun true
{
  "validationMessages": [
    {
      "level": "WARNING",
      "type": "EfaValidator",
      "message": "The EC2 instance selected (g5.12xlarge) supports enhanced networking capabilities using Elastic Fabric Adapter (EFA). EFA enables you to run applications requiring high levels of inter-node communications at scale on AWS at no additional charge. You can update the cluster's configuration to enable EFA (https://docs.aws.amazon.com/parallelcluster/latest/ug/efa-v3.html)"
    },
    {
      "level": "WARNING",
      "type": "EfaValidator",
      "message": "The EC2 instance selected (g5.24xlarge) supports enhanced networking capabilities using Elastic Fabric Adapter (EFA). EFA enables you to run applications requiring high levels of inter-node communications at scale on AWS at no additional charge. You can update the cluster's configuration to enable EFA (https://docs.aws.amazon.com/parallelcluster/latest/ug/efa-v3.html)"
    },
    {
      "level": "WARNING",
      "type": "EfaValidator",
      "message": "The EC2 instance selected (g5.48xlarge) supports enhanced networking capabilities using Elastic Fabric Adapter (EFA). EFA enables you to run applications requiring high levels of inter-node communications at scale on AWS at no additional charge. You can update the cluster's configuration to enable EFA (https://docs.aws.amazon.com/parallelcluster/latest/ug/efa-v3.html)"
    },
    {
      "level": "WARNING",
      "type": "EfaValidator",
      "message": "The EC2 instance selected (p5.48xlarge) supports enhanced networking capabilities using Elastic Fabric Adapter (EFA). EFA enables you to run applications requiring high levels of inter-node communications at scale on AWS at no additional charge. You can update the cluster's configuration to enable EFA (https://docs.aws.amazon.com/parallelcluster/latest/ug/efa-v3.html)"
    },
    {
      "level": "WARNING",
      "type": "LdapTlsReqCertValidator",
      "message": "For security reasons it's recommended to use hard or demand"
    },
    {
      "level": "WARNING",
      "type": "DcvValidator",
      "message": "With this configuration you are opening DCV port 8443 to the world (0.0.0.0/0). It is recommended to restrict access."
    }
  ],
  "message": "Request would have succeeded, but DryRun flag is set."
}

Steps to reproduce

  1. Download the attached config and replace the account number and other parameters accordingly
  2. Try to create cluster with following command
    pcluster create-cluster -n cluster380 -c config380.yaml -r us-east-1

Additional context:

  1. Here are the packages with their versions in my virtual environment
(pctest) sh-4.2$ pip freeze
attrs==23.2.0
aws-cdk.assets==1.204.0
aws-cdk.aws-acmpca==1.204.0
aws-cdk.aws-apigateway==1.204.0
aws-cdk.aws-applicationautoscaling==1.204.0
aws-cdk.aws-autoscaling==1.204.0
aws-cdk.aws-autoscaling-common==1.204.0
aws-cdk.aws-autoscaling-hooktargets==1.204.0
aws-cdk.aws-batch==1.204.0
aws-cdk.aws-certificatemanager==1.204.0
aws-cdk.aws-cloudformation==1.204.0
aws-cdk.aws-cloudfront==1.204.0
aws-cdk.aws-cloudwatch==1.204.0
aws-cdk.aws-codebuild==1.204.0
aws-cdk.aws-codecommit==1.204.0
aws-cdk.aws-codeguruprofiler==1.204.0
aws-cdk.aws-codestarnotifications==1.204.0
aws-cdk.aws-cognito==1.204.0
aws-cdk.aws-dynamodb==1.204.0
aws-cdk.aws-ec2==1.204.0
aws-cdk.aws-ecr==1.204.0
aws-cdk.aws-ecr-assets==1.204.0
aws-cdk.aws-ecs==1.204.0
aws-cdk.aws-efs==1.204.0
aws-cdk.aws-elasticloadbalancing==1.204.0
aws-cdk.aws-elasticloadbalancingv2==1.204.0
aws-cdk.aws-events==1.204.0
aws-cdk.aws-fsx==1.204.0
aws-cdk.aws-globalaccelerator==1.204.0
aws-cdk.aws-iam==1.204.0
aws-cdk.aws-imagebuilder==1.204.0
aws-cdk.aws-kinesis==1.204.0
aws-cdk.aws-kms==1.204.0
aws-cdk.aws-lambda==1.204.0
aws-cdk.aws-logs==1.204.0
aws-cdk.aws-route53==1.204.0
aws-cdk.aws-route53-targets==1.204.0
aws-cdk.aws-s3==1.204.0
aws-cdk.aws-s3-assets==1.204.0
aws-cdk.aws-sam==1.204.0
aws-cdk.aws-secretsmanager==1.204.0
aws-cdk.aws-servicediscovery==1.204.0
aws-cdk.aws-signer==1.204.0
aws-cdk.aws-sns==1.204.0
aws-cdk.aws-sns-subscriptions==1.204.0
aws-cdk.aws-sqs==1.204.0
aws-cdk.aws-ssm==1.204.0
aws-cdk.aws-stepfunctions==1.204.0
aws-cdk.cloud-assembly-schema==1.204.0
aws-cdk.core==1.204.0
aws-cdk.custom-resources==1.204.0
aws-cdk.cx-api==1.204.0
aws-cdk.region-info==1.204.0
aws-parallelcluster==3.8.0
boto3==1.33.13
botocore==1.33.13
cattrs==23.1.2
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
clickclick==20.10.2
connexion==2.13.1
constructs==3.4.344
exceptiongroup==1.2.0
Flask==2.2.5
idna==3.6
importlib-metadata==6.7.0
importlib-resources==5.12.0
inflection==0.5.1
itsdangerous==2.1.2
Jinja2==3.1.2
jmespath==0.10.0
jsii==1.85.0
jsonschema==4.17.3
MarkupSafe==2.1.3
marshmallow==3.19.0
packaging==23.2
pkgutil_resolve_name==1.3.10
publication==0.0.3
pyrsistent==0.19.3
python-dateutil==2.8.2
PyYAML==6.0.1
requests==2.31.0
s3transfer==0.8.2
six==1.16.0
tabulate==0.8.10
typeguard==2.13.3
typing_extensions==4.7.1
urllib3==1.26.18
Werkzeug==2.2.3
zipp==3.15.0

Hi @gallanik , I can't reproduce the issue by adding Monitoring/Alarms/Enabled: True to my generic simple cluster configuration.

Are you able to provide a simpler cluster configuration file where you can reproduce the issue?

More importantly, could you please provide the version of Python that you are using to create the cluster?

I tried it again today and surprisingly, it worked. There is no difference in the all packages, except Jinja2, which was upgraded 3.1.3, but that does not make any difference. I created a fresh virtualenv and installed plcuster again. With the same configuration, it succeeded.
As of now, I am not sure why it was failing earlier. But I am glad it works. Thank you for looking into this.

attrs==23.2.0
aws-cdk.assets==1.204.0
aws-cdk.aws-acmpca==1.204.0
aws-cdk.aws-apigateway==1.204.0
aws-cdk.aws-applicationautoscaling==1.204.0
aws-cdk.aws-autoscaling==1.204.0
aws-cdk.aws-autoscaling-common==1.204.0
aws-cdk.aws-autoscaling-hooktargets==1.204.0
aws-cdk.aws-batch==1.204.0
aws-cdk.aws-certificatemanager==1.204.0
aws-cdk.aws-cloudformation==1.204.0
aws-cdk.aws-cloudfront==1.204.0
aws-cdk.aws-cloudwatch==1.204.0
aws-cdk.aws-codebuild==1.204.0
aws-cdk.aws-codecommit==1.204.0
aws-cdk.aws-codeguruprofiler==1.204.0
aws-cdk.aws-codestarnotifications==1.204.0
aws-cdk.aws-cognito==1.204.0
aws-cdk.aws-dynamodb==1.204.0
aws-cdk.aws-ec2==1.204.0
aws-cdk.aws-ecr==1.204.0
aws-cdk.aws-ecr-assets==1.204.0
aws-cdk.aws-ecs==1.204.0
aws-cdk.aws-efs==1.204.0
aws-cdk.aws-elasticloadbalancing==1.204.0
aws-cdk.aws-elasticloadbalancingv2==1.204.0
aws-cdk.aws-events==1.204.0
aws-cdk.aws-fsx==1.204.0
aws-cdk.aws-globalaccelerator==1.204.0
aws-cdk.aws-iam==1.204.0
aws-cdk.aws-imagebuilder==1.204.0
aws-cdk.aws-kinesis==1.204.0
aws-cdk.aws-kms==1.204.0
aws-cdk.aws-lambda==1.204.0
aws-cdk.aws-logs==1.204.0
aws-cdk.aws-route53==1.204.0
aws-cdk.aws-route53-targets==1.204.0
aws-cdk.aws-s3==1.204.0
aws-cdk.aws-s3-assets==1.204.0
aws-cdk.aws-sam==1.204.0
aws-cdk.aws-secretsmanager==1.204.0
aws-cdk.aws-servicediscovery==1.204.0
aws-cdk.aws-signer==1.204.0
aws-cdk.aws-sns==1.204.0
aws-cdk.aws-sns-subscriptions==1.204.0
aws-cdk.aws-sqs==1.204.0
aws-cdk.aws-ssm==1.204.0
aws-cdk.aws-stepfunctions==1.204.0
aws-cdk.cloud-assembly-schema==1.204.0
aws-cdk.core==1.204.0
aws-cdk.custom-resources==1.204.0
aws-cdk.cx-api==1.204.0
aws-cdk.region-info==1.204.0
aws-parallelcluster==3.8.0
boto3==1.33.13
botocore==1.33.13
cattrs==23.1.2
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
clickclick==20.10.2
connexion==2.13.1
constructs==3.4.344
exceptiongroup==1.2.0
Flask==2.2.5
idna==3.6
importlib-metadata==6.7.0
importlib-resources==5.12.0
inflection==0.5.1
itsdangerous==2.1.2
Jinja2==3.1.3
jmespath==0.10.0
jsii==1.85.0
jsonschema==4.17.3
MarkupSafe==2.1.3
marshmallow==3.19.0
packaging==23.2
pkgutil_resolve_name==1.3.10
publication==0.0.3
pyrsistent==0.19.3
python-dateutil==2.8.2
PyYAML==6.0.1
requests==2.31.0
s3transfer==0.8.2
six==1.16.0
tabulate==0.8.10
typeguard==2.13.3
typing_extensions==4.7.1
urllib3==1.26.18
Werkzeug==2.2.3
zipp==3.15.0

Thank you for the update!