S3 Express Session opened for all asyncio calls
LouisAuneau opened this issue · comments
Hello,
Describe the bug
We are aiming at improving our uploading speed using S3 Express One Zone. Running the following code using botocore, I manage to upload files in a reasonable time:
import botocore.session
bucket = 'my-bucket-eun1-az1--x-s3'
region = 'eu-north-1'
session = botocore.session.get_session()
client = session.create_client('s3', region_name=region)
for i, file in enumerate(dataset):
client.put_object(
Body=file,
Bucket=bucket,
Key=f'file_{i}.jpg'
)
However, I wanted to improve perfomance by running async with the following code:
import aiobotocore.session
bucket = 'my-bucket-eun1-az1--x-s3'
region = 'eu-north-1'
session = aiobotocore.session.get_session()
async with session.create_client('s3', region_name=region) as client:
tasks = []
for i, file in enumerate(dataset):
tasks.append(client.put_object(
Body=file,
Bucket=bucket,
Key=f'file_{i}.jpg'
))
await asyncio.gather(*tasks)
But it seems when running multiple put_object
call asynchronously, the CreateSession
endpoint is called at each call, leading to the following ClientError
:
ClientError: An error occurred (SlowDown) when calling the CreateSession operation (reached max retries: 4): Reduce your request rate.
Checklist
- I have reproduced in environment where
pip check
passes without errors - I have provided
pip freeze
results - I have provided sample code or detailed way to reproduce
- I have tried the same code in botocore to ensure this is an aiobotocore specific issue
- I have tried similar code in aiohttp to ensure this is is an aiobotocore specific issue -> Time consuming as it requires to re-implement the authentication system of AWS.
- I have checked the latest and older versions of aiobotocore/aiohttp/python to see if this is a regression / injection -> First and latest version that supports S3 Express One Zone.
pip freeze results
absl-py==1.4.0
aiobotocore==2.9.0
aiofile==3.8.8
aiofiles==23.2.1
aiohttp==3.9.1
aiohttp-retry==2.8.3
aioitertools==0.11.0
aiopath==0.5.12
aiosignal==1.3.1
anyio==3.7.1
asn1crypto==1.5.1
asttokens==2.4.1
async-timeout==4.0.3
asyncpg==0.29.0
attrs==23.2.0
awscrt==0.19.17
azure-devops==6.0.0b4
backoff==2.2.1
boto3==1.33.13
botocore==1.33.2
botocore-stubs==1.34.17
cachetools==5.3.2
caio==0.9.13
certifi==2023.11.17
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
cloud-sql-python-connector==1.5.0
cloudpickle==2.2.1
comm==0.2.1
cryptography==41.0.7
debugpy==1.8.0
decorator==5.1.1
Deprecated==1.2.14
docstring-parser==0.15
exceptiongroup==1.2.0
executing==2.0.1
fire==0.5.0
frozenlist==1.4.1
gcloud-aio-auth==4.2.3
gcloud-aio-storage==8.3.0
google-api-core==1.34.0
google-api-python-client==1.12.8
google-auth==2.26.2
google-auth-httplib2==0.1.1
google-auth-oauthlib==0.4.6
google-cloud-core==2.4.1
google-cloud-pubsub==2.19.0
google-cloud-storage==2.14.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.62.0
greenlet==3.0.3
grpc-google-iam-v1==0.13.0
grpcio==1.60.0
grpcio-status==1.48.2
grpcio-tools==1.48.2
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.2
httplib2==0.22.0
httpx==0.26.0
hyperframe==6.0.1
idna==3.6
importlib-metadata==6.11.0
ipykernel==6.28.0
ipython==8.18.1
isodate==0.6.1
jedi==0.19.1
jmespath==1.0.1
jsonschema==4.20.0
jsonschema-specifications==2023.12.1
jupyter_client==8.6.0
jupyter_core==5.7.1
kfp==1.8.22
kfp-pipeline-spec==0.1.16
kfp-server-api==1.8.5
kubernetes==25.3.0
matplotlib-inline==0.1.6
msrest==0.6.21
multidict==6.0.4
nest-asyncio==1.5.8
notion-client==2.2.1
numpy==1.23.5
oauthlib==3.2.2
opencv-python==4.9.0.80
packaging==23.2
pandas==2.0.3
parso==0.8.3
pexpect==4.9.0
pg8000==1.30.4
Pillow==9.5.0
platformdirs==4.1.0
portalocker==2.8.2
prompt-toolkit==3.0.43
proto-plus==1.23.0
protobuf==3.20.3
psutil==5.9.7
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydantic==1.10.13
PyGithub==1.59.1
Pygments==2.17.2
PyJWT==2.8.0
PyNaCl==1.5.0
pyparsing==3.1.1
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.2
qdrant-client==1.7.0
referencing==0.32.1
requests==2.31.0
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
rpds-py==0.16.2
rsa==4.9
s3transfer==0.8.2
scramp==1.4.4
semantic-version==2.10.0
six==1.16.0
slackclient==2.9.4
sniffio==1.3.0
SQLAlchemy==2.0.25
stack-data==0.6.3
strip-hints==0.1.10
tabulate==0.9.0
termcolor==2.4.0
toml==0.10.2
tornado==6.4
tqdm==4.66.1
traitlets==5.14.1
typer==0.9.0
types-aiobotocore==2.9.0
types-aiobotocore-s3==2.9.0
types-awscrt==0.20.0
typing_extensions==4.9.0
tzdata==2023.4
uritemplate==3.0.1
urllib3==1.26.18
wcwidth==0.2.13
websocket-client==1.7.0
wrapt==1.16.0
yarl==1.9.4
zipp==3.17.0
Environment:
- Python Version: 3.9.4
- OS name and version: Debian 12
Additional context
Add any other context about the problem here.
Thank you for reporting the issue. I have a few ideas and could need your help to check them out and report back:
-
We had to implement a custom credential cache that might be to blame. Could you please try to await the first
put_object
task before gathering the others?await tasks[0] await asyncio.gather(*tasks[1:])
If that helps, I will attempt to improve the caching logic.
-
According to docs for PutObject:
When you use this operation with a directory bucket, you must use virtual-hosted-style requests in the format Bucket_name.s3express-az_id.region.amazonaws.com.
Have you tried providing the bucket name in this format?
Thank you for your quick feedback. I tested your 2 points:
- Awaiting the first task before the other are executed indeed works. Let me know if I can be of any help for caching logic investigation.
- When providing the full endpoint (
my-bucket--eun1-az1--x-s3.s3express-eun1-az1.eu-north-1.amazonaws.com
in my example), I get aNoSuchBucket
error surprisingly.
- Awaiting the first task before the other are executed indeed works. Let me know if I can be of any help for caching logic investigation.
Great! I will prepare a fix ASAP.
- When providing the full endpoint (
my-bucket--eun1-az1--x-s3.s3express-eun1-az1.eu-north-1.amazonaws.com
in my example), I get aNoSuchBucket
error surprisingly.
Well, it was worth a shot.
@jakob-keller ah I missed that, ya it needs to use an asyncio.Lock