aio-libs / aiobotocore

asyncio support for botocore library using aiohttp

Home Page:https://aiobotocore.rtfd.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AWS_METADATA_SERVICE_NUM_ATTEMPTS effectively ignored due to uncaught botocore exceptions

sndrtj opened this issue · comments

Describe the bug
The AWS_METADATA_SERVICE_NUM_ATTEMPTS environment variable effectively gets ignored in many cases due to uncaught botocore errors in the AioIMDSFetcher.

Botocore has the following retryable exceptions:

  • ReadTimeoutError
  • EndpointConnectionError
  • ConnectionClosedError
  • ConnectTimeoutError

Aiobotocore, however, only retries on asyncio and aiohttp exceptions, and does not retry on the exceptions raised by botocore.

Example

The following is an illustrative example:

import aiobotocore.session
import aiobotocore.credentials
import asyncio
import logging


async def main():
    session = aiobotocore.session.get_session()
    tasks = [aiobotocore.credentials.get_credentials(session) for _ in range(1000)]
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    logging.basicConfig(level="DEBUG")
    asyncio.run(main())

Now run this as AWS_METADATA_SERVICE_NUM_ATTEMPTS=10 python test.py on an EC2 instance. This will (likely, may need a couple tries) fail with either a ConnectionClosedError or a ConnectTimeoutError, without any expected Caught retryable HTTP exception while making metadata service request message in logs.

Botocore example that does work as expected:

import botocore.session
import botocore.credentials
import logging


def main():
    session = botocore.session.get_session()
    for _ in range(1000):
        botocore.credentials.get_credentials(session)


if __name__ == "__main__":
    logging.basicConfig(level="DEBUG")
    main()

Checklist

  • I have reproduced in environment where pip check passes without errors
  • I have provided pip freeze results
  • I have provided sample code or detailed way to reproduce
  • I have tried the same code in botocore to ensure this is an aiobotocore specific issue
  • I have tried similar code in aiohttp to ensure this is is an aiobotocore specific issue
  • I have checked the latest and older versions of aiobotocore/aiohttp/python to see if this is a regression / injection

pip freeze results

aiobotocore==2.4.1
aiohttp==3.8.3
aioitertools==0.11.0
aiosignal==1.3.1
async-timeout==4.0.2
attrs==22.2.0
botocore==1.27.59
charset-normalizer==2.1.1
frozenlist==1.3.3
idna==3.4
jmespath==1.0.1
multidict==6.0.3
python-dateutil==2.8.2
six==1.16.0
typing_extensions==4.4.0
urllib3==1.26.13
wrapt==1.14.1
yarl==1.8.2

Environment:

  • Python Version: 3.8
  • OS name and version: Ubuntu 22.04

Additional context
I encountered this issue while experiencing errors in dvc with dvc pull. DVC seems to hit get_credentials for just about every object it retrieves from S3. My repository has about 7k objects, which seems to be more than enough to trigger this behaviour.

This bug is probably related to #961

thanks! will look into this asap

ya this seems like an oversight of not swapping the botocore exceptions after we started translating exceptions. Could you try instead importing RETRYABLE_HTTP_ERRORS from botocore.utils ?

Yes, when I patch RETRYABLE_HTTP_ERRORS with the ones from botocore.utils it works as expected :-).

from botocore.utils import RETRYABLE_HTTP_ERRORS
import aiobotocore.session
import aiobotocore.credentials
import aiobotocore.utils
import asyncio
import logging

# patch utils
aiobotocore.utils.RETRYABLE_HTTP_ERRORS = RETRYABLE_HTTP_ERRORS


async def main():
    session = aiobotocore.session.get_session()
    tasks = [aiobotocore.credentials.get_credentials(session) for _ in range(1000)]
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    logging.basicConfig(level="DEBUG")
    asyncio.run(main())

will get a patch out for this

Thanks for the quick resolution!

If you follow what i had to do for the tests in that pr is why I hate unit tests and prefer integration tests. They would have caught this issue instead of providing a sense of false security.