grafana / k6-jslib-aws

Javascript Library allowing to interact with AWS resources from k6 scripts

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

S3Client.getObject does not work for binary data

KOConchobhair opened this issue · comments

On top of #45, the S3Client cannot be used to download a binary file from S3!

as I'm sure you are aware, if you have discardResponseBodies=false
there’s a Params.responseType that you can pass to a call to k6's http like

  const res = http.get('https://httpbin.test.k6.io/bytes/10', {
    responseType: 'binary',
  });

and that controls whether res.body is a string or an ArrayBuffer.

however since the S3Client code does not pass in a responseType in this code

const res = await http.asyncRequest(method, signedRequest.url, null, {
headers: signedRequest.headers,
})

then that means it defaults to responseType: 'text' and that means

  • text: k6 will return it as a string. This might not be what you want in cases of binary data as the conversation to UTF-16 will probably break it.
    This is also the default if discardResponseBodies is set to false or not set at all.

We need a way to either pass thru or hard-code a responseType: 'binary' for the k6 http client call in getObject.
Do you have a preference on how that might be implemented?

Hi @KOConchobhair,

Thanks for raising this topic 🙇🏻 First, let me ask you, as a library user, do you have any preference?

From my side, I don't have a clear path in mind, but some of the thoughts from the top of my mind are:

  • In the context of S3.getObject specifically, I think that in general is better to assume that the contents are binary (use ArrayBuffer?), because no matter what kind of content is more common, I think it's generally easier to convert the body into a string later, if needed, than the opposite.
    • Although I know this is probably not very feasible now as it would represent a breaking change in respect to the current behavior.
  • That said, unless we want to go all-in with ArrayBuffer (which doesn't sound really bad, except for what I mentioned above), I guess we'll need to add a new optional argument to the S3.getObject function.
    • What I'm wondering now is whether would make more sense as a client parameter/configuration, but at first glance it looks quite specific, although a few other service calls may also return binary data.

That said, let me invoke @oleiade, because as the main contributor of the project, he might have some thoughts to share.

hmm, it seems that the problem is worse than simply passing the responseType into the k6 http request. Because this doesn't even work:

        const res = await http.asyncRequest(method, signedRequest.url, null, {
            headers: signedRequest.headers,
            responseType: 'binary',
        })
        console.log(res)

it always returns {} for res.body

console.log(res) --> https://pastebin.com/raw/S3mZR9CH

The Typescript types for the RefinedResponseBody seem to want to treat the data as bytes in this case
https://github.com/DefinitelyTyped/DefinitelyTyped/blob/c58e3459ea3d434ca1e7bf3a0e1ffcb4a95d363b/types/k6/http.d.ts#L631

But it seems the k6 http module is returning an ArrayBuffer
https://github.com/grafana/k6/blob/73aaa790da2c66d9f3bbc3fd059246fc6ad5b63d/js/modules/k6/http/request.go#L134

I don't know if this is the root cause but it appears only responseType: 'text' is functional atm. I also confirmed that there's no issue with localstack and downloading an image file via the S3 API (using aws cli) so the problem appears to be in k6. Let me know if there is another repo to escalate this issue to.

okay, i think I see what's going on here. Even with the types fixed (I got a PR for that going here: DefinitelyTyped/DefinitelyTyped#69781) the k6 script itself is javascript so its a little confusing how to access the value for res.body. For example, res.body.byteLength works but console.log(res.body) always converts to empty object {}. Maybe thats expected for javascript/ArrayBuffer, I'm not sure. But the good news is the k6 http module is working fine for binary files! So I will proceed with a PR to address the originally reported issue with the following recommended approach (which should also help with #45 as well)

add a new optional argument to the S3.getObject function.

Hey @KOConchobhair,

Thanks for spending time on this, and for bringing all the details! 🙇🏻 Really appreciated!

For example, res.body.byteLength works but console.log(res.body) always converts to empty object {}.

I guess this might be related with 👉🏻 grafana/k6#3677 (comment)

In any case, regarding your contributions:

Thanks! 🙏🏻

  • Really looking forward for that PR to address the reported issue.

Okay, a PR for this issue is ready for review: #107
(I also think it mostly resolves #45 as well)

I've got a second PR for general cleanup/unit test fix ready for review as well: #106

Regarding the types, yeah I suppose I could move the changes there. Although while I don't claim to understand the release process, this fix seems like a bug to me in the currently released v0.51.0 version of k6. Why would we have to wait until v0.52.0 to fix an existing issue with the types? It would seem DefinitelyTyped can handle an update prior to to the next version.

For context, I believe this is connected to #45; for which I unfortunately didn't find a satisfying solution at the time.

Regarding the type definitions, we try to keep changes aligned there with the k6 releases indeed. However, in this specific case, I believe we could probably go ahead and amend the k6 v0.51 definitions directly.

Hey folks 👋🏻

First and foremost, thanks for raising this and proactively contributing to the resolution @KOConchobhair 🙇🏻

I've been considering our options there, and although k6 initially adopted this approach in its HTTP module, and it made sense there, I'd rather avoid adding a responseType parameter to getObject.

Although we don't support most of them, the S3 getObject action actually supports many pre-existing request parameters, and I'd rather leverage those instead.

Thus, my preference would be to add an additionalHeaders (or any better name) argument to s3.getObject, which would be added to the underlying request. Using this, we could integrate the logic in the method so that if, say, the additional headers contain an Accept header set to application/octet-stream, we then treat the response as binary and return an ArrayBuffer or bytes.

That way, we can also start supporting more of the underlying API call request parameters as we need them in the future.

What do you folks think?

For reference, I've tried implementing it on the side to test the idea, and I quite like the look and feel...

implementation

async getObject(bucketName: string, objectKey: string, additionalHeaders = {}): Promise<S3Object> {
        // Prepare request
        const method = 'GET'

        const signedRequest = this.signature.sign(
            {
                method: method,
                endpoint: this.endpoint,
                path: encodeURI(`/${bucketName}/${objectKey}`),
                headers: {
                    ...additionalHeaders
                },
            },
            {}
        )


        let responseType: ResponseType = 'text'
        if ('Accept' in additionalHeaders &&
            additionalHeaders['Accept'] !== undefined &&
            additionalHeaders['Accept'] === "application/octet-stream") {
           responseType = "binary"
        }

        const res = await http.asyncRequest(method, signedRequest.url, null, {
            headers: signedRequest.headers,
            responseType: responseType as ResponseType,
        })
        this._handle_error('GetObject', res)

        return new S3Object(
            objectKey,
            Date.parse(res.headers['Last-Modified']),
            res.headers['ETag'],
            parseInt(res.headers['Content-Length']),

            // The X-Amz-Storage-Class header is only set if the storage class is
            // not the default 'STANDARD' one.
            (res.headers['X-Amz-Storage-Class'] ?? 'STANDARD') as StorageClass,

            res.body
        )
    }

Usage

// Let's redownload it, verify it's correct, and delete it
    const downloadedFileContent = await s3.getObject(
        testBucketName,
        testFileKey,
        { 'Accept': 'application/octet-stream' })

    console.log(`downloadedFileContent.byteLength: ${downloadedFileContent.data.byteLength}`)
    console.log(downloadedFileContent.data instanceof ArrayBuffer)

    if (downloadedFileContent.byteLength !== testFileContent.byteLength) {
        exec.test.abort("Downloaded file size does not match the original file size")
    }