`aws-vault proxy` fails with HTTP 502 when parent process exits
aryounce opened this issue · comments
This issue outlines a failure case for running a macOS desktop application with aws-vault exec --server
. When running an application like PyCharm where the user wishes to execute scripts that utilize the AWS SDKs and have said scripts utilize local EC2 Metadata Server emulation aws-vault
fails due to the parent of the proxy process exiting.
Issue Checklist
- I am using the latest release of AWS Vault
- I have provided my
.aws/config
(redacted if necessary) - I have provided the debug output using
aws-vault --debug
(redacted if necessary)
aws-vault Version
$ aws-vault --version
6.6.2-Homebrew
Example Profile
[profile default]
region=us-east-1
output=json
[profile dev_profile]
region=us-east-1
output=json
credential_source=Ec2InstanceMetadata
mfa_serial=arn:aws:iam::XXXXXXXXXXXX:mfa/username
role_arn=arn:aws:iam::XXXXXXXXXXXX:role/limited-permissions-role
aws-vault invocation with (--debug) output
$ aws-vault --debug exec --server --prompt=osascript dev_profile -- open -a PyCharm.app
2023/02/07 18:41:56 aws-vault v6.6.1
2023/02/07 18:41:56 Loading config file /Users/username/.aws/config
2023/02/07 18:41:56 Parsing config file /Users/username/.aws/config
2023/02/07 18:41:56 [keyring] Considering backends: [keychain]
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", keychain="aws-vault.keychain"
2023/02/07 18:41:56 [keyring] Found 2 results
2023/02/07 18:41:56 profile dev_profile: using stored credentials
2023/02/07 18:41:56 profile dev_profile: using GetSessionToken (with MFA)
2023/02/07 18:41:56 profile dev_profile: using AssumeRole (chained MFA)
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", keychain="aws-vault.keychain"
2023/02/07 18:41:56 [keyring] Found 2 results
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", keychain="aws-vault.keychain"
2023/02/07 18:41:56 [keyring] Found 2 results
2023/02/07 18:41:56 [keyring] Querying keychain for service="aws-vault", account="sts.GetSessionToken,XXXXXXXxxXXxxxXx,XXXxXxXXxxxxXXXXXxXXXXxXXxXXXXxxXxxxXxXxxxxxXXXx,1675821116", keychain="aws-vault.keychain"
2023/02/07 18:42:00 [keyring] Found item "aws-vault session for dev_profile (expires 2023-02-08T01:51:56Z)"
2023/02/07 18:42:00 Re-using cached credentials ****************XXXX from sts.GetSessionToken, expires in 9m55.206728s
2023/02/07 18:42:01 Generated credentials ****************XXXX using AssumeRole, expires in 59m59.850563s
2023/02/07 18:42:01 Setting subprocess env: AWS_DEFAULT_REGION=us-east-1, AWS_REGION=us-east-1
2023/02/07 18:42:01 Starting child process: open -a PyCharm.app
2023/02/07 18:42:01 Starting EC2 Instance Metadata server on 127.0.0.1:9099
Problem
My understanding of what is happening is informed by reviewing aws-vault
behavior during execution and reading through the source. I welcome additional context. Additionally I'm referencing these sources for my assumptions:
- Issue #174 comment on "two servers"
- USAGE document 'Desktop Apps' section.
The USAGE doc recommends open -a
on macOS. The aws-vault
invocation above follows this recommendation when launching PyCharm and the Python scripts using boto3
fail to retrieve credentials from the aws-vault proxy
with a HTTP 502 Bad Gateway
error. boto3
debug logging includes the following:
2023-02-07 20:00:51,832 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: env
2023-02-07 20:00:51,832 - MainThread - botocore.credentials - DEBUG - Looking for credentials via: assume-role
2023-02-07 20:00:51,833 - MainThread - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 169.254.169.254:80
2023-02-07 20:00:52,810 - MainThread - urllib3.connectionpool - DEBUG - http://169.254.169.254:80 "PUT /latest/api/token HTTP/1.1" 502 0
2023-02-07 20:00:52,820 - MainThread - urllib3.connectionpool - DEBUG - http://169.254.169.254:80 "GET /latest/meta-data/iam/security-credentials/ HTTP/1.1" 502 0
2023-02-07 20:00:52,820 - MainThread - botocore.utils - DEBUG - Metadata service returned non-200 response with status code of 502 for url: http://169.254.169.254/latest/meta-data/iam/security-credentials/, content body: b''
2023-02-07 20:00:52,820 - MainThread - botocore.utils - DEBUG - Max number of attempts exceeded (1) when attempting to retrieve data from metadata service.
The two aws-vault proxy
processes are running, but since open
exits once PyCharm starts the aws-vault
parent process exits as well. This is the core of the problem.
The work-around is to change the aws-vault
invocation (note the addition of -W
when calling open
) to:
$ aws-vault --debug exec --server --prompt=osascript dev_profile -- open -W -a PyCharm.app
This keeps the parent aws-vault
process alive for the duration that PyCharm is open, and the EC2 Instance Metadata "emulator" then works.
Issues with existing behavior
The primary problem with the current behavior is that it is very confusing even when debug logging is turned on in the process that is using the AWS SDK and/or in aws-vault
as neither is obvious. Programs using the AWS SDK run through the various credential acquisition mechanisms and then just report that all of them failed.
The fact that the aws-vault proxy
processes live on beyond the initial aws-vault
invocation made this particularly confusing to track down.
Suggested Changes
One or more of the following would help make this failure case less confusing.
- Shut down the proxy processes when the original
aws-vault
parent process exits. - Return an error message in the response to the client that reflects the failure of the proxy to acquire credentials from the parent (in the event of a race where a metadata server client makes a request that cannot be serviced before the proxy exits).
- Better document
aws-vault
expectations of the sub-processes it launches on behalf of the user.
Thank you @mtibben, that PR looks great.