terricain / aioboto3

Wrapper to use boto3 resources with the aiobotocore async backend

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exception event triggered in a coroutine is not handled correctly

rlindsberg opened this issue · comments

  • Async AWS SDK for Python version: 11.2.0
  • Python version: 3.9.18
  • Operating System: Ubuntu 22.04.2 LTS x86_64

Description

When uploading a file larger than 83886080000 bytes, then uploading stalls with no error. An investigation shows that an exception has been triggered, but it was not handled correctly.

Program stalls on

https://github.com/terrycain/aioboto3/blob/0fd1896a9009b7ce1d8a8934f0faf14653c6378d/aioboto3/s3/inject.py#L318-L322

What I Did

fallocate -l 80G ubuntu-80g

Then upload that file with aioboto3 to s3.

You may use this fork to see some debug logs:

https://github.com/rlindsberg/aioboto3/tree/investigate-exception-not-handled-correctly

Yeah I can see a few places this might crop up, can you share the test code to reproduce this?

Hey man, I am running production code from my client, so I am afraid I cannot share it. But I will do my best to assist you.

I also have a few theories about where it may go wrong. Can I give it a try to fix this?

Feel free to give it a go, though if you can't reproduce it with a simple example I can't easily carve out some time to look into it

Thanks! Here is a sample code you can try to reproduce the bug:

import asyncio
import logging
import os

import aioboto3

upload_folder_path = '/tmp/test'
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')


async def upload_file(file_name):
    try:
        session = aioboto3.Session()

        async with session.client('s3', region_name='eu-west-1') as s3_client:
            file_path = os.path.join(upload_folder_path, file_name)

            await s3_client.upload_file(Filename=file_path, Bucket='test-bucket-for-s3-sync', Key=file_name)

    except Exception as e:
        logging.exception(e)


async def run():
    files_to_be_uploaded = os.listdir(upload_folder_path)
    await asyncio.gather(*(upload_file(f) for f in files_to_be_uploaded))


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(run())

I think I was wrong in my first comment, the program stalls on
https://github.com/terrycain/aioboto3/blob/0fd1896a9009b7ce1d8a8934f0faf14653c6378d/aioboto3/s3/inject.py#L315

L319 will be executed when file_reader_future finishes. When the file to be uploaded is larger than 83886080000 bytes, then all multiparts with id greater than 10 000 will be rejected by AWS. This prevents the io_queue to be emptied. The io_queue has only length of 100 so file_reader waits forever to insert a new item in the queue, and therefore file_reader_future will never finish.

Maybe we can work together to patch this? I really like this project and I would like to contribute to it!