boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ExpiredTokenException: Error when retrieving credentials from container-role

akefirad opened this issue · comments

commented

Describe the bug

Recently we moved from IRSA to Pod Identity for our pods in the EKS cluster. After migrating, we started to see that after sometime (not sure when exactly), the app cannot make any AWS call due to expired token issue with the following error:

Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response 400 from container metadata: [18b9740f-d967-4c97-86da-2f50f794b032]: (ExpiredTokenException): The token included in the request is expired: current date/time 2024-02-10T08:38:55.284028Z must be before the expiration date/time 2024-02-09T10:47:44Z., fault: client
The full stack trace:

  ... application code here ...
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/boto3/resources/factory.py", line 581, in do_action
    response = action(self, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/boto3/resources/action.py", line 88, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/client.py", line 553, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/client.py", line 989, in _make_api_call
    http, parsed_response = self._make_request(
                            ^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/client.py", line 1015, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/endpoint.py", line 198, in _send_request
    request = self.create_request(request_dict, operation_model)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/endpoint.py", line 134, in create_request
    self._event_emitter.emit(
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
               ^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/signers.py", line 105, in handler
    return self.sign(operation_name, request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/signers.py", line 186, in sign
    auth = self.get_auth_instance(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/signers.py", line 301, in get_auth_instance
    frozen_credentials = credentials.get_frozen_credentials()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 634, in get_frozen_credentials
    self._refresh()
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 522, in _refresh
    self._protected_refresh(is_mandatory=is_mandatory_refresh)
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 538, in _protected_refresh
    metadata = self._refresh_using()
               ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 1971, in fetch_creds
    raise CredentialRetrievalError(
botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response 400 from container metadata: [18b9740f-d967-4c97-86da-2f50f794b032]: (ExpiredTokenException): The token included in the request is expired: current date/time 2024-02-10T08:38:55.284028Z must be before the expiration date/time 2024-02-09T10:47:44Z., fault: client

Expected Behavior

The session should automatically refresh the token.

Current Behavior

For some reason, after sometime (not sure when exactly), the session cannot refresh the token anymore.

Reproduction Steps

  1. Create a role and grant it access using EKS Pod Identity feature.
  2. Deploy a pod with the right service account.
  3. After sometime, for example one day, try to make an AWS call with a session created when the app started.

Possible Solution

No response

Additional Information/Context

I'm not sure. I checked the token in the pod (curl http://169.254.170.23/v1/credentials -H "Authorization: $(cat $AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE)") and it has a 6 hours expiry. But surprisingly enough, when I tested long after 6 hours, the session was healthy (and AWS calls were successful). But the same pod, after 24 hours, started to throw the above error.
The fact that after 6 hours, the session was healthy tells me that the refresh mechanism works at least for sometime until it doesn't. I'm not sure how the refresh-token logic works in Boto so can't really speculate here. Could it be because the pod was idle for too long? FWIW, the pod has very little traffic, so if there's any lazy-evaluation involved in the get-token login in Boto, it might affect the whole experience (i.e. it's important when the first and the subsequent calls have been made.)
Also there's several other open issues around talking about lack of auto-refresh feature in Boto. Again, the fact that it works after 6-hours but not after e.g. 24 hours tells me it might be a bug not a missing feature.
Happy to provide more information if needed. LMK. Thanks.

SDK version used

1.34.34

Environment details (OS name and version, etc.)

EKS K8S 1.29 Linux amzn2.x86_64 x86_64 GNU/Linux

Hi there @akefirad,

This was likely fixed in today's release with #3114, can you try updating to version 1.34.41 and let us know if your issue is fixed?

commented

OK, thanks for the update. I'll switch to the latest version and let it run. But It'll take a day or two to be confident. I'll keep you posted.

@akefirad We saw the same error -- the ExpiredTokenException sometime after 24h. Have you tested the new version of botocore yet?

commented

It’s been two days that the service is running without any issue. So I guess the fix is working.

Great! I'll close out the issue for now, but if anyone else sees it in a version after 1.34.41, please feel free to reopen this

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.