aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.

Home Page:https://github.com/aws/aws-parallelcluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(3.0.0-3.9.0) Build image CloudFormation stacks fail to delete after images are successfully built

hanwen-pcluste opened this issue · comments

The issue

Starting from September 19, 2023, build image CloudFormation stack ends in DELETE_FAILED status after the image is successfully built. The failures in the stack events look like:

Timestamp Logical ID Status Status reason
2023-12-01 06:00:20 UTC-0800 aws-parallelcluster-3-7-2-amzn2-hvm-arm64-202312011211 DELETE_FAILED The following resource(s) failed to delete: [DeleteStackFunctionExecutionRole].
2023-12-01 06:00:19 UTC-0800 DeleteStackFunctionExecutionRole DELETE_FAILED Internal Failure

The image is built correctly despite the stack is in DELETE_FAILED and you can use it as custom AMI for cluster creation.

Affected versions

ParallelCluster versions 3.0.0-3.7.2 are affected.

Mitigation

See details in Wiki https://github.com/aws/aws-parallelcluster/wiki/(3.0.0%E2%80%903.7.2)-Build-image-CloudFormation-stacks-fail-to-delete-after-images-are-successfully-built