terricain / aioboto3

Wrapper to use boto3 resources with the aiobotocore async backend

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Since 13.0.0 async streaming uploads don't read the whole stream.

szymonzmilczakpandadoc opened this issue · comments

  • Async AWS SDK for Python version: 13.0.0
  • Python version: 3.11
  • Operating System: Linux

Description

I'm trying to stream download file from url and stream upload it to S3.
In 12.4.0 this code works fine. In 13.0.0 it uploads only first couple hundred of bytes.

What I Did

import asyncio
import aiohttp
import aioboto3


async def main():
    url = "https://pdfobject.com/pdf/sample.pdf"  # size: 18810 bytes
    boto_session = aioboto3.Session()
    http_session = aiohttp.ClientSession()
    response = await http_session.get(url)
    async with boto_session.client("s3") as s3:
        await s3.upload_fileobj(
            Fileobj=response.content, Bucket="bucket", Key="sample.pdf"
        )
        # uploads only about 899 first bytes

asyncio.run(main())

Yeah something is wrong here, am having a look.

Ok I've fixed it, will release it later tonight as I want to introduce some tests that exercise this behaviour.

What happens is there was a naive assumption that .read(num_bytes) would return at most num_bytes, but if it was less, that was all that would be received. This is very much not the case as b'' would be returned by .read(...) if there was nothing left to consume. So with a aiohttp stream returning less bytes than the multipart threshold, it then proceeded to take the quick path of a singular .put_object(...) instead of the multipart dance. Is a simple enough fix to loop and consume enough data until either EOF or the multipart threshold is reached and then continue.

This should be fixed in v13.0.1

Thank you! I'll test once you publish 13.0.1.

Its out :)