awslabs / aws-sdk-kotlin

Multiplatform AWS SDK for Kotlin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ByteStream Extension fun does not work as expected

lauzadis opened this issue · comments

I tried to replace my codes using FLow<ByteArray>.toByteStream in my codes, but encountered error like this.

The Github actions stack: https://github.com/hantsy/aws-sdk-kotlin-spring-example/actions/runs/6939132384/job/18875971395?pr=6#step:6:79

The original code caused this issue:

suspend fun S3Client.store(bucketName: String, resourceKey: String, data: Flux<DataBuffer>) {
    this.createBucketIfNotExists(bucketName)
    val mediaType = MediaTypeFactory.getMediaType(resourceKey)
        .orElseGet { MediaType.APPLICATION_OCTET_STREAM }

    val byteArrayFlow = data
        .map { dataBuffer ->
            val bytes = ByteArray(dataBuffer.readableByteCount())
            dataBuffer.read(bytes)
            DataBufferUtils.release(dataBuffer)
            bytes
        }
        .asFlow()

    val request = PutObjectRequest {
        bucket = bucketName
        body = byteArrayFlow.toByteStream(applicationScope) // here I use toByteStream to transfer the data type.
        key = resourceKey
        contentType = mediaType.toString()
    }
    val result = try {
        this.putObject(request)
    } catch (e: Exception) {
        throw S3ClientException(e.message ?: "Failed to store object $resourceKey")
    }
    println("store object to $bucketName: ${result.eTag}")
}

Originally posted by @hantsy in #1127

Hi, I'm able to replicate this. Looking into a potential fix now.

My replication:

    val client = S3Client.fromEnvironment {
        credentialsProvider = // your credentials here
        region = "us-east-1"
    }

    val byteArrayFlow: Flow<ByteArray> = flowOf("abc".encodeToByteArray(), "def".encodeToByteArray())
    client.putObject {  
        bucket = // your bucket here
        key = "playing-with-flows.dat"
        body = byteArrayFlow.toByteStream(this@runBlocking)
    }

I see the same error Stream must be replayable to calculate a body hash, which is thrown during signing / canonicalization when the body is not replayable.

I am not sure why it have to accept a coroutinescope as param, and I created a custom scope like this,

private val applicationScope = CoroutineScope(SupervisorJob() + Dispatchers.IO)

It did not work.

I see the same error Stream must be replayable to calculate a body hash, which is thrown during signing / canonicalization when the body is not replayable.

So it can not process the Flow that is a hot stream? such as multipart from Spring reactive.

At this time S3 requires Content-Length to be set on all requests (see this issue for more explanation). So, to successfully make the request, you need to provide a content length in the call to toByteStream.

    val client = S3Client.fromEnvironment {
        credentialsProvider = // your credentials here
        region = "us-east-1"
    }

    val byteArrayFlow: Flow<ByteArray> = flowOf("abc".encodeToByteArray(), "def".encodeToByteArray())
    client.putObject {  
        bucket = // your bucket here
        key = "playing-with-flows.dat"
        body = byteArrayFlow.toByteStream(this@runBlocking, 6) // must provide content length
    }

Got it.

Hope there are a real dynamic Flow improvement that does not requires the content length. Spring reactive does not need it when reading/writing contents to HTTP.

If you're handling an HTTP request and proxying data to S3 you might be able to use the HTTP request Content-Length header if it's available to you.

It's possible that we can abstract this in the future but it's not a limitation of the SDK but of the underlying S3 PutObject request.

Closing this as there is no further action to take at this time.

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

I also stumbled across this issue and it seems that setting the contentLength isn't (always) enough.

In my case I have a:

  • isOneShot ByteStream
  • Request of known size and known SHA256
  • Request is smaller than the chunk size

In this case in AwsHttpSigner hashSpecification is set to CalculateFromPayload as:

  • contextHashSpecification == null
  • body != HttpBody.Empty
  • ! body.isEligibleForAwsChunkedStreamin
  • !isUnsignedPayload

This later leads to the same issue.

It might be that this is really only due to the small object size (~20 KB), as chunking might fix this?

My workaround was the following:

S3Client {
      interceptors.add(object : HttpInterceptor {
          override suspend fun modifyBeforeSigning(context: ProtocolRequestInterceptorContext<Any, HttpRequest>): HttpRequest {
            (context.request as? PutObjectRequest)?.let { putObjectRequest ->
              val body = putObjectRequest.body
              val sha256 = putObjectRequest.checksumSha256
              if (body?.isOneShot == true && sha256 != null) {
                val sha256Base64 = Base64.getDecoder().decode(sha256).encodeToHex()
                // Set SHA256 for signature calculation from known content SHA256
                context.executionContext.attributes[AwsSigningAttributes.HashSpecification] = HashSpecification.Precalculated(sha256Base64)
              }
            }
            return context.protocolRequest
          }
        })
}

Maybe it would make sense to add this special case to the library?

@felixscheinost can you open a new issue (with a reproduction if possible)? If you have a known content length then it should work with or without chunking and we should probably look at what you're seeing closer.