99designs / aws-vault

A vault for securely storing and accessing AWS credentials in development environments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`aws-vault proxy` fails with HTTP 502 when parent process exits

aryounce opened this issue · comments

This issue outlines a failure case for running a macOS desktop application with aws-vault exec --server. When running an application like PyCharm where the user wishes to execute scripts that utilize the AWS SDKs and have said scripts utilize local EC2 Metadata Server emulation aws-vault fails due to the parent of the proxy process exiting.

Issue Checklist

  • I am using the latest release of AWS Vault
  • I have provided my .aws/config (redacted if necessary)
  • I have provided the debug output using aws-vault --debug (redacted if necessary)

aws-vault Version

$ aws-vault --version
6.6.2-Homebrew

Example Profile

[profile default]
region=us-east-1
output=json

[profile dev_profile]
region=us-east-1
output=json
credential_source=Ec2InstanceMetadata
mfa_serial=arn:aws:iam::XXXXXXXXXXXX:mfa/username
role_arn=arn:aws:iam::XXXXXXXXXXXX:role/limited-permissions-role

aws-vault invocation with (--debug) output

$ aws-vault --debug exec --server --prompt=osascript dev_profile -- open -a PyCharm.app

2023/02/07 18:41:56 aws-vault v6.6.1
2023/02/07 18:41:56 Loading config file /Users/username/.aws/config
2023/02/07 18:41:56 Parsing config file /Users/username/.aws/config
2023/02/07 18:41:56 [keyring] Considering backends: [keychain]
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", keychain="aws-vault.keychain"
2023/02/07 18:41:56 [keyring] Found 2 results
2023/02/07 18:41:56 profile dev_profile: using stored credentials
2023/02/07 18:41:56 profile dev_profile: using GetSessionToken (with MFA)
2023/02/07 18:41:56 profile dev_profile: using AssumeRole (chained MFA)
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", keychain="aws-vault.keychain"
2023/02/07 18:41:56 [keyring] Found 2 results
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", keychain="aws-vault.keychain"
2023/02/07 18:41:56 [keyring] Found 2 results
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", account="sts.GetSessionToken,XXXXXXXxxXXxxxXx,XXXxXxXXxxxxXXXXXxXXXXxXXxXXXXxxXxxxXxXxxxxxXXXx,1675821116", keychain="aws-vault.keychain"
2023/02/07 18:42:00 [keyring] Found item "aws-vault session for dev_profile (expires 2023-02-08T01:51:56Z)"
2023/02/07 18:42:00 Re-using cached credentials ****************XXXX from sts.GetSessionToken, expires in 9m55.206728s
2023/02/07 18:42:01 Generated credentials ****************XXXX using AssumeRole, expires in 59m59.850563s
2023/02/07 18:42:01 Setting subprocess env: AWS_DEFAULT_REGION=us-east-1, AWS_REGION=us-east-1
2023/02/07 18:42:01 Starting child process: open -a PyCharm.app
2023/02/07 18:42:01 Starting EC2 Instance Metadata server on 127.0.0.1:9099

Problem

My understanding of what is happening is informed by reviewing aws-vault behavior during execution and reading through the source. I welcome additional context. Additionally I'm referencing these sources for my assumptions:

The USAGE doc recommends open -a on macOS. The aws-vault invocation above follows this recommendation when launching PyCharm and the Python scripts using boto3 fail to retrieve credentials from the aws-vault proxy with a HTTP 502 Bad Gateway error. boto3 debug logging includes the following:

2023-02-07 20:00:51,832 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: env
2023-02-07 20:00:51,832 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: assume-role
2023-02-07 20:00:51,833 - MainThread - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 169.254.169.254:80
2023-02-07 20:00:52,810 - MainThread - urllib3.connectionpool - DEBUG - http://169.254.169.254:80 "PUT /latest/api/token HTTP/1.1" 502 0
2023-02-07 20:00:52,820 - MainThread - urllib3.connectionpool - DEBUG - http://169.254.169.254:80 "GET /latest/meta-data/iam/security-credentials/ HTTP/1.1" 502 0
2023-02-07 20:00:52,820 - MainThread - botocore.utils - DEBUG - Metadata service returned non-200 response with status code of 502 for url: http://169.254.169.254/latest/meta-data/iam/security-credentials/, content body: b''
2023-02-07 20:00:52,820 - MainThread - botocore.utils - DEBUG - Max number of attempts exceeded (1) when attempting to retrieve data from metadata service.

The two aws-vault proxy processes are running, but since open exits once PyCharm starts the aws-vault parent process exits as well. This is the core of the problem.

The work-around is to change the aws-vault invocation (note the addition of -W when calling open) to:

$ aws-vault --debug exec --server --prompt=osascript dev_profile -- open -W -a PyCharm.app

This keeps the parent aws-vault process alive for the duration that PyCharm is open, and the EC2 Instance Metadata "emulator" then works.

Issues with existing behavior

The primary problem with the current behavior is that it is very confusing even when debug logging is turned on in the process that is using the AWS SDK and/or in aws-vault as neither is obvious. Programs using the AWS SDK run through the various credential acquisition mechanisms and then just report that all of them failed.

The fact that the aws-vault proxy processes live on beyond the initial aws-vault invocation made this particularly confusing to track down.

Suggested Changes

One or more of the following would help make this failure case less confusing.

  • Shut down the proxy processes when the original aws-vault parent process exits.
  • Return an error message in the response to the client that reflects the failure of the proxy to acquire credentials from the parent (in the event of a race where a metadata server client makes a request that cannot be serviced before the proxy exits).
  • Better document aws-vault expectations of the sub-processes it launches on behalf of the user.

Thank you @mtibben, that PR looks great.