aws-containers / amazon-ecs-exec-checker

🚀 Pre-flight checks for ECS Exec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"aws ecs execute-command" reports timeout despite all-green report from the checker

antifuchs opened this issue · comments

Adding a summary here by @toricls for people who came here through search engines.

Issue description

Cannot exec into container with an error Encountered error while initiating handshake. Handshake timed out. Please ensure that you have the latest version of the session manager plugin. under the following conditions, despite the checker script reports all green.

Steps to reproduce the issue

This issue happens when using 1) AWS SSO temporary credentials and 2) KMS encryption enabled for ECS Exec. Here is the steps to reproduce the error.

  1. Configure ECS cluster's ExecuteCommandConfiguration with any KMS key ID.
  2. Run aws ecs execute-command with AWS credentials obtained by aws sso login command

Root cause

The session-manager-plugin doesn't support AWS SSO temporary credentials today (See aws/session-manager-plugin#4)

Workaround

See the comment below.

UPDATE Aug. 23rd, 2021

The Session Manager plugin now supports AWS SSO temporary credentials with the version 1.2.245.0 or later.


I'm pretty stumped by the failure mode we're seeing, and I imagine so is the exec-checker, as it reports an all-green status (with a warning for sts:StartSession, which we will tune in due time):

$ env AWS_PROFILE=mz-core-admin ./check-ecs-exec.sh apps-cluster-255a2cf d63f19bb0b434ae489612b6d9215e5f8
-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
  jq      | OK (/run/current-system/sw/bin/jq)
  AWS CLI | OK (/run/current-system/sw/bin/aws)

-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
  AWS CLI Version        | OK (aws-cli/2.2.14 Python/3.9.5 Darwin/20.5.0 source/arm64 prompt/off)
  Session Manager Plugin | OK (1.2.234.0)

-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : us-east-1
Cluster: apps-cluster-255a2cf
Task   : d63f19bb0b434ae489612b6d9215e5f8
-------------------------------------------------------------
  Cluster Configuration  |
     KMS Key       : arn:aws:kms:us-east-1:834237029485:key/5513178e-9387-4969-be9a-67dee8883899
     Audit Logging : OVERRIDE
     S3 Bucket Name: Not Configured
     CW Log Group  : /ecs/apps-cluster, Encryption Enabled: true
  Can I ExecuteCommand?  | arn:aws:iam::834237029485:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_Administrator_3c1b1a3b260fa528
     ecs:ExecuteCommand: allowed
     kms:GenerateDataKey: allowed
     ssm:StartSession denied?: allowed
  Task Status            | RUNNING
  Launch Type            | Fargate
  Platform Version       | 1.4.0
  Exec Enabled for Task  | OK
  Container-Level Checks |
    ----------
      Managed Agent Status
    ----------
         1. RUNNING for "bors"
    ----------
      Init Process Enabled (apps-bors:6)
    ----------
         1. Enabled - "bors"
    ----------
      Read-Only Root Filesystem (apps-bors:6)
    ----------
         1. Disabled - "bors"
  Task Role Permissions  | arn:aws:iam::834237029485:role/bors-task-b7530f5
     ssmmessages:CreateControlChannel: allowed
     ssmmessages:CreateDataChannel: allowed
     ssmmessages:OpenControlChannel: allowed
     ssmmessages:OpenDataChannel: allowed
     -----
     kms:Decrypt: allowed
     -----
     logs:DescribeLogGroups: allowed
     logs:CreateLogStream: allowed
     logs:DescribeLogStreams: allowed
     logs:PutLogEvents: allowed
  VPC Endpoints          | SKIPPED (vpc-0693f607384d76722 - No additional VPC endpoints required)

So it sounds like our task/cluster gets a clean bill of health from the checker, but trying to execute a command fails with a timeout:

$ aws ecs execute-command --profile mz-core-admin --region us-east-1 --cluster apps-cluster-255a2cf  --task d63f19bb0b434ae489612b6d9215e5f8 --container bors --command '/bin/sh' --interactive

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.


Starting session with SessionId: ecs-execute-command-02212f04c8f7d94e3
----------ERROR-------
Encountered error while initiating handshake. Handshake timed out. Please ensure that you have the latest version of the session manager plugin.

I'm sure there's something wrong in our configuration of ECS exec, but if there is, I hope the checker can be extended to pick up whatever error is lurking in this config.

Hello @antifuchs!

I assume you're using AWS SSO temporary credentials (obtained via aws sso login) + KMS key combination now but the Session Manager plugin doesn't support that combination today, see and subscribe this GitHub issue to get notified for the update.

In the meantime, I think one of the following workaround would work for you but could you possibly consider and try?

  1. Configure your ECS cluster's log configuration without KMS key
  2. Grab the AWS CLI credentials from the AWS SSO Console instead of using aws sso login, and execute aws ecs execute-command with it

Screenshot for the workaround 2
sso-creds

Thanks for that - we could not get (2) to work - same behavior as with the short-lived SSO-generated credentials. But disabling ECS cluster log encryption did entirely fix the issue. I'm not super excited to run with encryption off, but I guess we can make this a break-glass option.

To add to my previous comment - if you can detect that short-lived/SSO credentials are in play, I hope you can add a detector for those. That scenario cost me the better half of yesterday to debug (-:

Thank you for your understanding, and I totally agree with you it should support KMS for enabling better security posture. I'll talk to the team off-line to make sure it won't drop from the roadmap and make things forward.

add a detector for short-lived/SSO credentials

Thanks, this is a great idea! I'll be digging into what the script can work for.

Question - Would you mind if I 1) edit the issue title and 2) add a summary of this issue in your first issue comment, for users who will visit here via search engines?

Oh yes - please edit away. I hope anyone who stumbles across this finds it helpful!

Thanks @antifuchs, updated the issue comment!