aws-containers / amazon-ecs-exec-checker

🚀 Pre-flight checks for ECS Exec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

need to check the environment variables for AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY

InbarRose opened this issue · comments

commented

We had an annoying issue where the checker was showing everything was configured okay. and we could connect to some of our containers but not all of them. we had to get AWS support involved.

the error we got was

An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

it turns out, that the SSM uses the AWS SDK which by default uses the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY environment variables if they are present. we had those configured in our container but with very limited access, and this caused the SSM to fail, and so we couldn't do execute command.

I suggest to add a check when we "describe task" that will see if there are AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY defined in a container, that this could cause an issue. and ideally it should not be used, but if it is used then it should have SSM permissions.

ref: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-default

commented

I've opened a PR #50

@InbarRose which error did you get? I've started having a problem recently when attempting to connect to some of my containers, ecs exec checker shows no problem, but my "web" containers (which have the AWS_ACCESS_KEY set for S3 access) with an error like the following:

An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.

However, I don't get this error on other services. I'm wondering if this is the problem I'm running into - but it only started happening a week or so ago.

Having created a new task definition for a new service, which is identical except without the AWS_ACCESS_KEY_ID environment variables present, I'm now able to connect. So yes, it looks like this was this issue -- weird it only started happening within the last week or so though.

commented

Yes, that was the error i got too.

Great utility, thanks for building it!

I'm having a similar issue and I'm trying to name the env variable something else. Full details: aws/copilot-cli#3408

I'll keep it posted here.

EDIT:

I can confirm. Having AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in your environment variables will prevent you from connecting to ECS

We were also struggling with this issue and found we had AWS_SECRET_ACCESS_KEY set. Renaming it fixed the issue. @InbarRose @george-silva THANK YOU

Hey ya'll, thanks for bringing this up and for proposing a fix! I do have a follow up question to better understand the problem statement.

What is the reason for using the AWS_ACCESS_KEY_ID/SECRET as opposed to letting the AWS SDK pull credentials dynamically from the IAM role attached to the task? Generally it's recommended to not hardcode credentials as environment variables as these values can be read in plaintext in the task definition.

Thanks again!

@adamjkeller it's common to have configs in ENV because of https://12factor.net/config. In our case they aren't plain-text in the task definition, but injected during deploy. We're moving away from this though.

commented

well, i am happy that this is helpful to so many people.

the reason we had it is because even though we were using the role on the task for everything, one of the libraries we used expected these environment variables in order to connect to some resource outside of the scope of the role that runs the task, so we have since change the library to use diferently named environment variables.

@adamjkeller it's common to have configs in ENV because of https://12factor.net/config. In our case they aren't plain-text in the task definition, but injected during deploy. We're moving away from this though.

Thank you for the response!

Awesome, i'm very familiar with 12-factor! Using environment variables for dynamic configuration values is absolutely the right approach, but I want to better understand why some are using AWS_ACCESS_* env vars when the SDK will grab those values dynamically from the metadata endpoints available to the container at runtime (via the credential provider chain).

examples of how sdk grabs credentials:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html#credentials-chain

well, i am happy that this is helpful to so many people.

the reason we had it is because even though we were using the role on the task for everything, one of the libraries we used expected these environment variables in order to connect to some resource outside of the scope of the role that runs the task, so we have since change the library to use diferently named environment variables.

Ah, I had a feeling this would be one of the reasons we see this. Glad the library gave the option to look at a different env var name. Thanks for sharing!

TLDR: AWS Copilot does not fully support all AWS services equally and we needed a way out to access these services. Therefore we injected KEY/SECRET variables into our containers.

Longer answer:

We are using Python to build an application and we're using AWS Copilot to deploy it on ECS. The truth is while copilot is great, there are some pain points to integrating with non-copilot things. Example: we have infrastructure built with terraform that doesn't quite play well with the copilot's built-in way of doing things (or they don't support it).

So our terraform scripts create IAM profiles that we use and have the specific permissions for all different services. These are injected in our container.

commented

@adamjkeller thanks for approving and merging my PR. I hope it helps. I guess this can be closed now?

commented

weird it only started happening within the last week or so though.

Yes, this is something we noticed too. AWS must have changed something internally. 🤷‍♂️

Awesome, i'm very familiar with 12-factor! Using environment variables for dynamic configuration values is absolutely the right approach, but I want to better understand why some are using AWS_ACCESS_* env vars when the SDK will grab those values dynamically from the metadata endpoints available to the container at runtime (via the credential provider chain).

In my case, it was legacy reasons -- we previously hosted on DigitalOcean on plain VM's running docker compose, and therefore these needed to be read from ENV rather than provided through integrated AWS auth.

When we migrated to AWS, I left the config in there for our app to use for S3 access.

@InbarRose, just to check - your PR checks for AWS_ACCESS_KEY but @george-silva mentioned the issue was with AWS_ACCESS_KEY_ID, are both causing the problem? I have the latter =)

commented

@InbarRose I can confirm that AWS_ACCESS_KEY_ID also gets you into trouble:

image

In the image there is diff from our previous configuration, setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and moving towards other names.

ps: don't worry, these are not the secrets themselves, just paths to resolve them internally on Copilot.

commented

another PR to add AWS_ACCESS_KEY_ID #52

Hi u dont need to delete AWS enviroments/credentials from your tasks.

  1. Go to IAM
  2. Next click on users
  3. In search paste value from your aws-access-key-id (from your task enviroment)
  4. Add ssm permissions to the user that u searched
  5. {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "ssmmessages:CreateControlChannel",
    "ssmmessages:CreateDataChannel",
    "ssmmessages:OpenControlChannel",
    "ssmmessages:OpenDataChannel"
    ],
    "Resource": "*"
    }
    ]
    }
  6. Everything should work fine. I had a match with this for a long time and luckily the support helped me. Maybe it will help someone. GL and have a nice day.

Thanks for the feedback and contributions! Closing this issue as both PR's fix it up.

weird it only started happening within the last week or so though.

Yes, this is something we noticed too. AWS must have changed something internally. 🤷‍♂️

Yes indeed. Probably aws changed something related to security and didn't report it

Sorry to comment on a closed issue but it seems the new checks for AWS_ACCESS_KEY etc only check the environment array and not the secrets array, both of which can be used to set environment variables inside the container.

I've currently got an issue where I'm loading the value for AWS_ACCESS_KEY from Secrets Manager and can't connect to ECS Exec as the variable is set, but these checks pass.

Sorry to comment on a closed issue but it seems to new checks for AWS_ACCESS_KEY etc only check the environment array and not the secrets array, both of which can be used to set environment variables inside the container. I've currently got an issue where I'm loading the value for AWS_ACCESS_KEY from Secrets Manager and can't connect to ECS Exec as the variable is set, but these checks pass.

Did u add permissions to the AWS_ACCESS_KEY_ID like I wrote last time?

Sure I can add the permissions to fix this or better still I will rename that variable but I just want to raise that the check itself doesn't cover all scenarios for setting the variable.