awslabs / aws-sdk-kotlin

Multiplatform AWS SDK for Kotlin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

S3 PUT: Setting Content-Length can cause `HttpException`

mgroth0 opened this issue · comments

Describe the bug

Setting the Content-Length in an S3 putObject request can sometimes cause an HttpException. I have narrowed it further. In the test below, only the toByteStream causes the bug.

Expected behavior

  1. The error should be more informative. It is confusing and hard to diagnose when it says "expected 3 bytes but received 175" when you know for sure you are only sending 3 bytes.

  2. Behavior should be more consistent across ByteStream instances. It doesn't make sense that swapping one ByteStream for another would cause a bug like this. Conceptually the ByteStream we provide is just a source of bytes. When swapping which ByteStream we use, we do not expect any functional changes to how the HTTP request is sent. Maybe performance-related changes, but nothing major like this.

  3. Setting the content length in our put-object request to be the exact length of the bytes we are providing should never cause an error like this. If somewhere in the pipline bytes are added internally, then that should be handled internally.

Current behavior

None of the above expectations are met.

Steps to Reproduce

import aws.sdk.kotlin.services.s3.S3Client
import aws.sdk.kotlin.services.s3.putObject
import aws.smithy.kotlin.runtime.content.ByteStream
import aws.smithy.kotlin.runtime.content.toByteStream
import aws.smithy.kotlin.runtime.http.HttpErrorCode.SDK_UNKNOWN
import aws.smithy.kotlin.runtime.http.HttpException
import kotlinx.coroutines.flow.flow
import kotlinx.coroutines.test.runTest
import org.junit.jupiter.api.assertThrows
import java.util.concurrent.atomic.AtomicInteger
import kotlin.test.Test
import kotlin.test.assertEquals


class ReplicateContentLengthBug() {
    @Test
    fun replicateContentLengthBug() {
        runTest {
            val client =
                S3Client.fromEnvironment {
                    credentialsProvider = // private
                }
            client.use {
                val myBucket = "my-temp-test-bucket-0"
                fun myTestKey(number: Int) = "/my-temp-test-key-$number"
                val array = byteArrayOf(1, 2, 3)
                val arraySize = array.size.toLong()
                val counter = AtomicInteger()
                suspend fun doPut(
                    stream: ByteStream,
                    contentLengthHeader: Long? =  null
                ) {
                    client.putObject {
                        bucket = myBucket
                        key = myTestKey(counter.getAndIncrement())
                        body = stream
                        contentLength = contentLengthHeader
                    }
                }
                doPut(
                    stream = ByteStream.fromBytes(array),
                    contentLengthHeader = null
                )
                doPut(
                    stream = flow { emit(array) }.toByteStream(this, contentLength = arraySize),
                    contentLengthHeader = null
                )
                doPut(
                    stream = ByteStream.fromBytes(array),
                    contentLengthHeader = arraySize
                )
                val exception =
                    assertThrows<HttpException> {
                        doPut(
                            stream = flow { emit(array) }.toByteStream(this, contentLength = arraySize),
                            contentLengthHeader = arraySize
                        )
                    }
                assertEquals(SDK_UNKNOWN, exception.errorCode)
                assertEquals("java.net.ProtocolException: expected 3 bytes but received 175", exception.message)
            }
        }
    }
}

Possible Solution

  • More informative exceptions
  • More consistent behavior when using different bytestreams
  • Fully handle content-length related issues (like extra bytes being added)

Context

This problem actually first occured for me a while ago, maybe almost year ago. Then somehow I found a workaround, but it wasn't robust and I didn't fully understand it. Recently, I am not sure if it was an update to the library or changes I made on my end or both, but the issue came back. After a lot of debugging I finally narrowed down on the problem.

I know it likely has something to do with chunked encoding, which I have seen mentioned in recent changelists. I want to emphasize that anything having to do with chunked encoding should be an implementation detail that is abstracted away by this library. I don't think as a user that I should have to worry about it for basic usage like I showed in the test.

AWS Kotlin SDK version used

1.0.73

Platform (JVM/JS/Native)

JVM

Operating System and version

Mac

I'm able to replicate the failure and taking a look at a fix. As a temporary workaround, specify the contentLength in the toByteStream(...) call instead of the request object.

                client.putObject {
                    bucket = myBucket
                    key = myTestKey(counter.getAndIncrement())
                    body = flow { emit(array) }.toByteStream(this, contentLength = arraySize)
                    contentLength = null // don't specify contentLength here.
                }

Confirming that removing contentLength from the putObject configuration is a valid workaround. I have been using that workaround since I created this issue, and have not seen any exception since.

Hi, a fix has been merged and should be available in a few hours, under v1.0.79. Thanks again for your detailed report!

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.