need to check the environment variables for AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY

Question

need to check the environment variables for AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY

InbarRose opened this issue 2 years ago · comments

We had an annoying issue where the checker was showing everything was configured okay. and we could connect to some of our containers but not all of them. we had to get AWS support involved.

the error we got was

An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

it turns out, that the SSM uses the AWS SDK which by default uses the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY environment variables if they are present. we had those configured in our container but with very limited access, and this caused the SSM to fail, and so we couldn't do execute command.

I suggest to add a check when we "describe task" that will see if there are AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY defined in a container, that this could cause an issue. and ideally it should not be used, but if it is used then it should have SSM permissions.

ref: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-default

Inbar · Answer 1 · Thu Mar 31 2022 20:26:11 GMT+0800 (China Standard Time)

I've opened a PR #50

Tom Russell · Answer 2 · Thu Mar 31 2022 20:26:34 GMT+0800 (China Standard Time)

@InbarRose which error did you get? I've started having a problem recently when attempting to connect to some of my containers, ecs exec checker shows no problem, but my "web" containers (which have the AWS_ACCESS_KEY set for S3 access) with an error like the following:

An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

However, I don't get this error on other services. I'm wondering if this is the problem I'm running into - but it only started happening a week or so ago.

Tom Russell · Answer 3 · Thu Mar 31 2022 21:06:27 GMT+0800 (China Standard Time)

Having created a new task definition for a new service, which is identical except without the AWS_ACCESS_KEY_ID environment variables present, I'm now able to connect. So yes, it looks like this was this issue -- weird it only started happening within the last week or so though.

Inbar · Answer 4 · Thu Mar 31 2022 21:25:15 GMT+0800 (China Standard Time)

Yes, that was the error i got too.

George Silva · Answer 5 · Thu Mar 31 2022 23:08:38 GMT+0800 (China Standard Time)

Great utility, thanks for building it!

I'm having a similar issue and I'm trying to name the env variable something else. Full details: aws/copilot-cli#3408

I'll keep it posted here.

EDIT:

I can confirm. Having AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in your environment variables will prevent you from connecting to ECS

Keith Roehrenbeck · Answer 6 · Fri Apr 01 2022 01:51:32 GMT+0800 (China Standard Time)

We were also struggling with this issue and found we had AWS_SECRET_ACCESS_KEY set. Renaming it fixed the issue. @InbarRose @george-silva THANK YOU

Adam Keller · Answer 7 · Fri Apr 01 2022 02:03:15 GMT+0800 (China Standard Time)

Hey ya'll, thanks for bringing this up and for proposing a fix! I do have a follow up question to better understand the problem statement.

What is the reason for using the AWS_ACCESS_KEY_ID/SECRET as opposed to letting the AWS SDK pull credentials dynamically from the IAM role attached to the task? Generally it's recommended to not hardcode credentials as environment variables as these values can be read in plaintext in the task definition.

Thanks again!

Keith Roehrenbeck · Answer 8 · Fri Apr 01 2022 02:09:00 GMT+0800 (China Standard Time)

@adamjkeller it's common to have configs in ENV because of https://12factor.net/config. In our case they aren't plain-text in the task definition, but injected during deploy. We're moving away from this though.

Inbar · Answer 9 · Fri Apr 01 2022 02:16:36 GMT+0800 (China Standard Time)

well, i am happy that this is helpful to so many people.

the reason we had it is because even though we were using the role on the task for everything, one of the libraries we used expected these environment variables in order to connect to some resource outside of the scope of the role that runs the task, so we have since change the library to use diferently named environment variables.

Adam Keller · Answer 10 · Fri Apr 01 2022 02:33:56 GMT+0800 (China Standard Time)

@adamjkeller it's common to have configs in ENV because of https://12factor.net/config. In our case they aren't plain-text in the task definition, but injected during deploy. We're moving away from this though.

Thank you for the response!

Awesome, i'm very familiar with 12-factor! Using environment variables for dynamic configuration values is absolutely the right approach, but I want to better understand why some are using AWS_ACCESS_* env vars when the SDK will grab those values dynamically from the metadata endpoints available to the container at runtime (via the credential provider chain).

examples of how sdk grabs credentials:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain

Adam Keller · Answer 11 · Fri Apr 01 2022 02:35:47 GMT+0800 (China Standard Time)

well, i am happy that this is helpful to so many people.

the reason we had it is because even though we were using the role on the task for everything, one of the libraries we used expected these environment variables in order to connect to some resource outside of the scope of the role that runs the task, so we have since change the library to use diferently named environment variables.

Ah, I had a feeling this would be one of the reasons we see this. Glad the library gave the option to look at a different env var name. Thanks for sharing!

George Silva · Answer 12 · Fri Apr 01 2022 03:31:25 GMT+0800 (China Standard Time)

TLDR: AWS Copilot does not fully support all AWS services equally and we needed a way out to access these services. Therefore we injected KEY/SECRET variables into our containers.

Longer answer:

We are using Python to build an application and we're using AWS Copilot to deploy it on ECS. The truth is while copilot is great, there are some pain points to integrating with non-copilot things. Example: we have infrastructure built with terraform that doesn't quite play well with the copilot's built-in way of doing things (or they don't support it).

So our terraform scripts create IAM profiles that we use and have the specific permissions for all different services. These are injected in our container.

Inbar · Answer 13 · Fri Apr 01 2022 03:57:52 GMT+0800 (China Standard Time)

@adamjkeller thanks for approving and merging my PR. I hope it helps. I guess this can be closed now?

Inbar · Answer 14 · Fri Apr 01 2022 04:10:33 GMT+0800 (China Standard Time)

weird it only started happening within the last week or so though.

Yes, this is something we noticed too. AWS must have changed something internally. 🤷‍♂️

Tom Russell · Answer 15 · Fri Apr 01 2022 04:10:40 GMT+0800 (China Standard Time)

Awesome, i'm very familiar with 12-factor! Using environment variables for dynamic configuration values is absolutely the right approach, but I want to better understand why some are using AWS_ACCESS_* env vars when the SDK will grab those values dynamically from the metadata endpoints available to the container at runtime (via the credential provider chain).

In my case, it was legacy reasons -- we previously hosted on DigitalOcean on plain VM's running docker compose, and therefore these needed to be read from ENV rather than provided through integrated AWS auth.

When we migrated to AWS, I left the config in there for our app to use for S3 access.

Padraic Harley · Answer 16 · Fri Apr 01 2022 19:17:08 GMT+0800 (China Standard Time)

@InbarRose, just to check - your PR checks for AWS_ACCESS_KEY but @george-silva mentioned the issue was with AWS_ACCESS_KEY_ID, are both causing the problem? I have the latter =)

Inbar · Answer 17 · Fri Apr 01 2022 19:20:21 GMT+0800 (China Standard Time)

Only AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY are mentioned in the documentation.

…

On Fri, Apr 1, 2022, 14:17 Padraic Harley ***@***.***> wrote: @InbarRose <https://github.com/InbarRose>, just to check - your PR checks for AWS_ACCESS_KEY but @george-silva <https://github.com/george-silva> mentioned the issue was with AWS_ACCESS_KEY_ID, are both causing the problem? I have the latter =) — Reply to this email directly, view it on GitHub <#49 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNBSIRD2WJMUXYI6N26NCDVC3LL5ANCNFSM5SA2XKRA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

George Silva · Answer 18 · Fri Apr 01 2022 20:15:54 GMT+0800 (China Standard Time)

@InbarRose I can confirm that AWS_ACCESS_KEY_ID also gets you into trouble:

In the image there is diff from our previous configuration, setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and moving towards other names.

ps: don't worry, these are not the secrets themselves, just paths to resolve them internally on Copilot.

Inbar · Answer 19 · Sun Apr 03 2022 14:36:58 GMT+0800 (China Standard Time)

another PR to add AWS_ACCESS_KEY_ID #52

lukasz90b · Answer 20 · Wed Apr 06 2022 18:08:11 GMT+0800 (China Standard Time)

Hi u dont need to delete AWS enviroments/credentials from your tasks.

Go to IAM
Next click on users
In search paste value from your aws-access-key-id (from your task enviroment)
Add ssm permissions to the user that u searched
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
],
"Resource": "*"
}
]
}
Everything should work fine. I had a match with this for a long time and luckily the support helped me. Maybe it will help someone. GL and have a nice day.

Adam Keller · Answer 21 · Fri Apr 08 2022 05:44:31 GMT+0800 (China Standard Time)

Thanks for the feedback and contributions! Closing this issue as both PR's fix it up.

lukasz90b · Answer 22 · Wed Apr 13 2022 15:20:01 GMT+0800 (China Standard Time)

weird it only started happening within the last week or so though.

Yes, this is something we noticed too. AWS must have changed something internally. 🤷‍♂️

Yes indeed. Probably aws changed something related to security and didn't report it

Jamie Wood · Answer 23 · Thu Apr 28 2022 19:15:45 GMT+0800 (China Standard Time)

Sorry to comment on a closed issue but it seems the new checks for AWS_ACCESS_KEY etc only check the environment array and not the secrets array, both of which can be used to set environment variables inside the container.

I've currently got an issue where I'm loading the value for AWS_ACCESS_KEY from Secrets Manager and can't connect to ECS Exec as the variable is set, but these checks pass.

lukasz90b · Answer 24 · Thu Apr 28 2022 19:20:02 GMT+0800 (China Standard Time)

Sorry to comment on a closed issue but it seems to new checks for AWS_ACCESS_KEY etc only check the environment array and not the secrets array, both of which can be used to set environment variables inside the container. I've currently got an issue where I'm loading the value for AWS_ACCESS_KEY from Secrets Manager and can't connect to ECS Exec as the variable is set, but these checks pass.

Did u add permissions to the AWS_ACCESS_KEY_ID like I wrote last time?

Jamie Wood · Answer 25 · Thu Apr 28 2022 19:21:48 GMT+0800 (China Standard Time)

Sure I can add the permissions to fix this or better still I will rename that variable but I just want to raise that the check itself doesn't cover all scenarios for setting the variable.