aiobotocore blocks the event loop with I/O in several locations
ekzhang opened this issue · comments
Describe the bug
Hi, we've been using aiobotocore through aioboto3, a thin wrapper, in an environment to access S3 resources:
session = aioboto3.Session()
async with session.client("s3") as s3:
pass
This code directly calls into aiobotocore. I noticed through runtime profiling that on a server in production, aiobotocore blocks the main executor thread, preventing any other tasks from running in the asyncio event loop. Here is an example stack trace (aiobotocore v2.4.2) that points to a line of code that loads SSL locations from the file system, using blocking I/O on the main thread:
Thread 1 (active): "MainThread"
__init__ (aiobotocore/httpsession.py:109)
__init__ (aiobotocore/utils.py:38)
__init__ (aiobotocore/utils.py:92)
create_credential_resolver (aiobotocore/credentials.py:91)
_create_credential_resolver (aiobotocore/session.py:51)
get_component (botocore/session.py:1081)
get_credentials (aiobotocore/session.py:80)
_create_client (aiobotocore/session.py:169)
__aenter__ (aiobotocore/session.py:26)
aiobotocore/aiobotocore/httpsession.py
Lines 107 to 110 in 14b2f5f
From more runtime profiling of the S3 provider on aiobotocore v2.4.2, besides line 109 of httpsession.py that reads SSL contexts, I noticed some stack traces for locations in aiobotocore that also run blocking I/O in the main thread, also inside _create_client
:
-
Again inside
_create_client
, on line 170, it calls_get_internal_component
on the sessionaiobotocore/aiobotocore/session.py
Lines 170 to 173 in 14b2f5f
which goes into the
botocore
package and eventually ends up callingopen(file).read()
on the main thread: -
Inside client.py's
create_client
function, on line 47, it calls_load_service_model
from botocore, which eventually calls into the same function from botocore as above that uses blocking I/O to read JSON configuration from the file system.aiobotocore/aiobotocore/client.py
Lines 46 to 48 in 14b2f5f
The first blocking code though, which reads SSL contexts on aiobotocore/httpsession.py:109
, seems to occur in the most places though, since it's also called downstream of functions like create_credential_resolver()
, Session._create_client()
, and AioRequestSigner.sign()
, which is used in S3 methods involved presigned URLs.
Checklist
- I have reproduced in environment where
pip check
passes without errors - I have provided
pip freeze
results - I have provided sample code or detailed way to reproduce
- I have tried the same code in botocore to ensure this is an aiobotocore specific issue
- I have tried similar code in aiohttp to ensure this is is an aiobotocore specific issue
- I have checked the latest and older versions of aiobotocore/aiohttp/python to see if this is a regression / injection
pip freeze results
aioboto3==10.4.0
aiobotocore==2.4.2
aiohttp==3.8.4
aioitertools==0.11.0
aiosignal==1.3.1
async-timeout==4.0.2
attrs==23.1.0
botocore==1.27.59
charset-normalizer==3.2.0
frozenlist==1.3.3
idna==3.4
jmespath==1.0.1
multidict==6.0.4
python-dateutil==2.8.2
six==1.16.0
typing-extensions==4.7.1
urllib3==1.26.16
wrapt==1.15.0
yarl==1.9.2
Environment:
- Python Version: 3.9
- OS name and version: Ubuntu 20.04
there isn't a core async file system concept yet in python, I don't think we're going to spawn threads to fix this
Makes sense, thanks for the reply. I understand that there are tradeoffs, and I appreciate your attention to simplicity in the project.
If it helps bring some color to others though, we ended up not using aiobotocore
due to the event loop-blocking issue since it caused large tail latency issues (>300 ms) in our server when requests would be queued up that each request clients. We saw our tail latencies drop more than 20x immediately after replacing it with just boto3
operations on another thread.
those file costs are just once per session/client IIRC, you probably can fix them by keeping your session/client alive for the life of your application
note that clients are heavy weight as each client is associated with a connection pool
@ekzhang : On an off topic, can you please share how you are doing runtime profiling
of your python code ?
ok ya, your session and client should be long-lived, that's just a fact of life with botocore. You don't want to keep re-creating the connection pools of ssl contexts
ok ya, your session and client should be long-lived, that's just a fact of life with botocore. You don't want to keep re-creating the connection pools of ssl contexts
For one more perspective, I just bumped into this same issue with a distributed task queue system with multiple workers. For architectural reasons I have to initialize aiobotocore clients per task (we don't know the credentials until the task runs). The code:
async with session.create_client("s3") as s3
blocks the main event loop for ~5 to 10 seconds. I don't mind waiting the seconds on every task startup, but blocking the main event loop causes all sorts of heartbeat timeouts on the workers.
Running the blocking calls @ekzhang mentioned using asyncio.to_thread() or something would solve this for my use case. I'm trying to decide now if there's a way to hack around this (maybe some kind of session caching system per worker) or if I also need to switch to regular botocore in my own threads.
File IO blocking aside, aiobotocore client init seems slower than botocore in general:
import time
import aiobotocore.session
import boto3
async def main():
t1 = time.perf_counter()
async with aiobotocore.session.get_session().create_client("s3") as s3:
pass
t2 = time.perf_counter()
print("aiobotocore elapsed:", t2 - t1)
t1 = time.perf_counter()
s3 = boto3.session.Session().client("s3")
t2 = time.perf_counter()
print("boto3 session elapsed:", t2 - t1)
t1 = time.perf_counter()
s3 = boto3.client("s3")
t2 = time.perf_counter()
print("boto3 client elapsed:", t2 - t1)
if __name__ == "__main__":
import asyncio
asyncio.run(main())
output:
$ py dev\aioboto_speeds.py
aiobotocore elapsed: 2.447070299880579
boto3 session elapsed: 1.494525299873203
boto3 client elapsed: 1.3479881000239402
In my real world task worker usage I saw an even bigger difference, going from around ~4 seconds with aiobotocore
to ~0.5 seconds with asyncio.to_thread(boto3.client())
I can open a new issue if you think this is worth looking into. That time really adds up in a concurrent task queue situation (especially when the main even loop blocks too) but I understand if you feel like that's acceptable for what should only be a once-per-process client session.