spack / spack-infrastructure

Spack Kubernetes instance and services running there (GitLab, CDash, spack.io)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some roles broken after cluster upgrades

scottwittenburg opened this issue · comments

After the last cluster upgrade, and again after this one, some roles were no longer usable because under their "Trust relationships" in the AWS console, the oidc provider id from the old cluster was still listed. This affected package signing, spackbot's PR binary graduation, and other things. After the most recent cluster upgrade, I ran the following script provided earlier by @mvandenburgh:

import boto3

client = boto3.client("iam")

old_oidc_id = "XXXXXXXXXXXXXXXXXXXXXXX"

for role in client.list_roles()["Roles"]:
    if old_oidc_id in str(role["AssumeRolePolicyDocument"]):
        role_name = role["RoleName"]
        role_arn = role["Arn"]
        print(f"{role_name} {role_arn}")

And that produced the following:

FullCRUDAccessToBucketSpackBinariesPRs arn:aws:iam::588562868276:role/FullCRUDAccessToBucketSpackBinariesPRs
KarpenterControllerRole-spack arn:aws:iam::588562868276:role/KarpenterControllerRole-spack
KarpenterIRSA-spack-production-20230124204943653100000011 arn:aws:iam::588562868276:role/KarpenterIRSA-spack-production-20230124204943653100000011
PutObjectInPipelineStatistics arn:aws:iam::588562868276:role/PutObjectInPipelineStatistics
notary-role arn:aws:iam::588562868276:role/notary-role

I manually updated the trust relationships of those roles to use the new oidc provider id, but it would be good to encode this in TF or somewhere, so after the next upgrade it can be done automatically.

Also, @AlmightyYakob mentioned that maybe we no longer even need KarpenterControllerRole-spack and KarpenterIRSA-spack-production-20230124204943653100000011, so maybe those roles can be removed? For now I just updated the oidc provider id, and will let @mvandenburgh decide whether they can be safely removed.

Looks like this is partly addressed by #485.

Also, @AlmightyYakob mentioned that maybe we no longer even need KarpenterControllerRole-spack and KarpenterIRSA-spack-production-20230124204943653100000011, so maybe those roles can be removed? For now I just updated the oidc provider id, and will let @mvandenburgh decide whether they can be safely removed.

Yes, those roles were from the two previous clusters - I just deleted both of them.

As far as I can tell, between those two ^ and the ones @AlmightyYakob encoded into Terraform in #485, that leaves these roles:

FullCRUDAccessToBucketSpackBinariesPRs arn:aws:iam::588562868276:role/FullCRUDAccessToBucketSpackBinariesPRs
PutObjectInPipelineStatistics arn:aws:iam::588562868276:role/PutObjectInPipelineStatistics

These are actively used by spackbot and the gitlab-api-scrape cron job, respectively. I'll make a PR encoding them into Terraform as well.