Since 13.0.0 async streaming uploads don't read the whole stream.

Question

Since 13.0.0 async streaming uploads don't read the whole stream.

szymonzmilczakpandadoc opened this issue 2 months ago · comments

szymonzmilczakpandadoc commented 2 months ago

Async AWS SDK for Python version: 13.0.0
Python version: 3.11
Operating System: Linux

Description

I'm trying to stream download file from url and stream upload it to S3.
In 12.4.0 this code works fine. In 13.0.0 it uploads only first couple hundred of bytes.

What I Did

import asyncio
import aiohttp
import aioboto3


async def main():
    url = "https://pdfobject.com/pdf/sample.pdf"  # size: 18810 bytes
    boto_session = aioboto3.Session()
    http_session = aiohttp.ClientSession()
    response = await http_session.get(url)
    async with boto_session.client("s3") as s3:
        await s3.upload_fileobj(
            Fileobj=response.content, Bucket="bucket", Key="sample.pdf"
        )
        # uploads only about 899 first bytes

asyncio.run(main())

Terri Cain · Answer 1 · Wed Jun 05 2024 19:39:55 GMT+0800 (China Standard Time)

Yeah something is wrong here, am having a look.

Terri Cain · Answer 2 · Wed Jun 05 2024 19:59:37 GMT+0800 (China Standard Time)

Ok I've fixed it, will release it later tonight as I want to introduce some tests that exercise this behaviour.

What happens is there was a naive assumption that .read(num_bytes) would return at most num_bytes, but if it was less, that was all that would be received. This is very much not the case as b'' would be returned by .read(...) if there was nothing left to consume. So with a aiohttp stream returning less bytes than the multipart threshold, it then proceeded to take the quick path of a singular .put_object(...) instead of the multipart dance. Is a simple enough fix to loop and consume enough data until either EOF or the multipart threshold is reached and then continue.

Terri Cain · Answer 3 · Wed Jun 05 2024 20:57:03 GMT+0800 (China Standard Time)

This should be fixed in v13.0.1

szymonzmilczakpandadoc · Answer 4 · Wed Jun 05 2024 21:04:15 GMT+0800 (China Standard Time)

Thank you! I'll test once you publish 13.0.1.

Terri Cain · Answer 5 · Wed Jun 05 2024 21:57:58 GMT+0800 (China Standard Time)

Its out :)