aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.

Home Page:https://github.com/aws/aws-parallelcluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lack of guidance on how to succeed with Service-Linked-Roles Using UI to create a FSX/Lustre S3 backed cluster

alfred-stokespace opened this issue · comments

Required Info:

  • AWS ParallelCluster version: 3.8.0
  • Full cluster configuration without any credentials or personal data.: mmm, nope.
  • Cluster name: REDACTED
  • Output of pcluster describe-cluster command.: Nope.
  • [Optional] Arn of the cluster CloudFormation main stack: Nope

Bug description and how to reproduce:
Using the PCluster User Interface API-GW/Lambda and when providing an S3 bucket config for cluster creation w/attached FSX/Lustre S3 file system you will get an error

Amazon FSx is unable to create Service-Linked-Role to access the S3 bucket. 
Ensure the IAM role or user you are using has the required permissions. 
For more details, visit https://docs.aws.amazon.com/fsx/latest/LustreGuide/setting-up.html#fsx-adding-permissions-s3. 
(Service: AmazonFSx; Status Code: 400; Error Code: BadRequest; Request ID: eb6ce2e0-6c1f-4c6f-bb7c-d71cad7ea27c; Proxy: null)

The documentation in that link is far to generic to be useful for how to succeed at creating this kind of cluster.
-- and --
The user doesn't know which role to select to make the modifications to...
-- and --
The magic role ( magic to the user ) that needs the additional policy statement is full of it's maximum 10 policies (see my other issue related to max policies #6114)

How To Get Past That Issue (but still fail with another)
Since I had no idea which of the several roles that PCluster3' CFT creates, to change; I went hunting through thousands of cloud trail entries (not knowing what I was looking for until I found the "not authorized" statement ) until I found the entry which stated what the role was and gave this error message,

arn:aws:sts::REDACTED:assumed-role/REDACTEDPRFIXParallelClusterLambdaRole-REDACTEDSUFFIX/pcluster3-ui-cft-stack-3e3-ParallelClusterFunction-REDACTEDSUFFIX is not authorized to perform: iam:PutRolePolicy 
on resource: arn:aws:iam::REDACTED:role/aws-service-role/s3.data-source.lustre.fsx.amazonaws.com/AWSServiceRoleForFSxS3Access_fs-REDACTED

That combined with the documentation link mentioned above in the error got me to realize how to proceed.

To continue you cannot add an additional policy attachment (see the bug I linked) which would be the cleaner way, you have to scavage amongst the 10 attached polices (or perhaps add an inline?, I didn't think about that), some of the 10 policies are near their maximum policy size so if you have to pick a policy that is very small with enough room to add the statement (not the whole policy)

{
        "Effect": "Allow",
        "Action": [
            "iam:CreateServiceLinkedRole",
            "iam:AttachRolePolicy",
            "iam:PutRolePolicy"
        ],
        "Resource": "arn:aws:iam::*:role/aws-service-role/s3.data-source.lustre.fsx.amazonaws.com/*"
    }

Now, you have a solution to that one issue, albeit ugly.

The next mystery is why the import fails...

Amazon FSx is unable to import objects from the linked data repository. Please file a ticket at https://console.aws.amazon.com/support/home#/. While filing your ticket, please include your file system ID and name of the linked data repository.