ExpiredTokenException: Error when retrieving credentials from container-role
akefirad opened this issue · comments
Describe the bug
Recently we moved from IRSA to Pod Identity for our pods in the EKS cluster. After migrating, we started to see that after sometime (not sure when exactly), the app cannot make any AWS call due to expired token issue with the following error:
Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response 400 from container metadata: [18b9740f-d967-4c97-86da-2f50f794b032]: (ExpiredTokenException): The token included in the request is expired: current date/time 2024-02-10T08:38:55.284028Z must be before the expiration date/time 2024-02-09T10:47:44Z., fault: client
The full stack trace:
... application code here ...
File "/opt/pysetup/.venv/lib/python3.12/site-packages/boto3/resources/factory.py", line 581, in do_action
response = action(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/boto3/resources/action.py", line 88, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/client.py", line 553, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/client.py", line 989, in _make_api_call
http, parsed_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/client.py", line 1015, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/endpoint.py", line 198, in _send_request
request = self.create_request(request_dict, operation_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/endpoint.py", line 134, in create_request
self._event_emitter.emit(
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/hooks.py", line 239, in _emit
response = handler(**kwargs)
^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/signers.py", line 105, in handler
return self.sign(operation_name, request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/signers.py", line 186, in sign
auth = self.get_auth_instance(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/signers.py", line 301, in get_auth_instance
frozen_credentials = credentials.get_frozen_credentials()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 634, in get_frozen_credentials
self._refresh()
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 522, in _refresh
self._protected_refresh(is_mandatory=is_mandatory_refresh)
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 538, in _protected_refresh
metadata = self._refresh_using()
^^^^^^^^^^^^^^^^^^^^^
File "/opt/pysetup/.venv/lib/python3.12/site-packages/botocore/credentials.py", line 1971, in fetch_creds
raise CredentialRetrievalError(
botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response 400 from container metadata: [18b9740f-d967-4c97-86da-2f50f794b032]: (ExpiredTokenException): The token included in the request is expired: current date/time 2024-02-10T08:38:55.284028Z must be before the expiration date/time 2024-02-09T10:47:44Z., fault: client
Expected Behavior
The session should automatically refresh the token.
Current Behavior
For some reason, after sometime (not sure when exactly), the session cannot refresh the token anymore.
Reproduction Steps
- Create a role and grant it access using EKS Pod Identity feature.
- Deploy a pod with the right service account.
- After sometime, for example one day, try to make an AWS call with a session created when the app started.
Possible Solution
No response
Additional Information/Context
I'm not sure. I checked the token in the pod (curl http://169.254.170.23/v1/credentials -H "Authorization: $(cat $AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE)"
) and it has a 6 hours expiry. But surprisingly enough, when I tested long after 6 hours, the session was healthy (and AWS calls were successful). But the same pod, after 24 hours, started to throw the above error.
The fact that after 6 hours, the session was healthy tells me that the refresh mechanism works at least for sometime until it doesn't. I'm not sure how the refresh-token logic works in Boto so can't really speculate here. Could it be because the pod was idle for too long? FWIW, the pod has very little traffic, so if there's any lazy-evaluation involved in the get-token login in Boto, it might affect the whole experience (i.e. it's important when the first and the subsequent calls have been made.)
Also there's several other open issues around talking about lack of auto-refresh feature in Boto. Again, the fact that it works after 6-hours but not after e.g. 24 hours tells me it might be a bug not a missing feature.
Happy to provide more information if needed. LMK. Thanks.
SDK version used
1.34.34
Environment details (OS name and version, etc.)
EKS K8S 1.29 Linux amzn2.x86_64 x86_64 GNU/Linux
OK, thanks for the update. I'll switch to the latest version and let it run. But It'll take a day or two to be confident. I'll keep you posted.
@akefirad We saw the same error -- the ExpiredTokenException
sometime after 24h. Have you tested the new version of botocore yet?
It’s been two days that the service is running without any issue. So I guess the fix is working.
Great! I'll close out the issue for now, but if anyone else sees it in a version after 1.34.41, please feel free to reopen this
This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.