aws / aws-sdk-go

AWS SDK for the Go programming language.

Home Page:http://aws.amazon.com/sdk-for-go/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

S3: error calling ListObjectsV2 with unusual file name in results

NathanBaulch opened this issue · comments

Describe the bug

I'm unable to list the contents of a bucket due to the presence of a file with "%10" in its name.

Expected Behavior

Successfully return the file object.

Current Behavior

Error:

could not list objects: SerializationError: failed to decode REST XML response
        status code: 200, request id: XM6P29PNE0M2FX9S
caused by: XML syntax error on line 2: illegal character code U+0010

Digging deeper, it looks like the XML unmarshaler is tripping up on the string sequence  in the file name 2018_POSTER_PRE.jpg. According to AWS Console the actual file name is 2018_POSTER%10_PRE.jpg.

Reproduction Steps

Complete example:

x := `
<ListBucketResult>
    <Contents>
        <Key>2018_POSTER&#x10;_PRE.jpg</Key>
    </Contents>
</ListBucketResult>`
r := &request.Request{
	HTTPResponse: &http.Response{Body: io.NopCloser(strings.NewReader(x))},
	Data:         &s3.ListObjectsV2Output{},
}
restxml.Unmarshal(r)
if r.Error != nil {
	panic(r.Error)
	// panic: SerializationError: failed to decode REST XML response
	// caused by: XML syntax error on line 3: illegal character code U+0010
}

Possible Solution

No response

Additional Information/Context

No response

SDK version used

v1.44.320

Environment details (Version of Go (go version)? OS name and version, etc.)

go1.21.3 windows/amd64

Hi @NathanBaulch,

I'm not able to reproduce this reported behavior.

package main

import (
	"context"
	"fmt"
	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
)

func main() {
	ctx := context.Background()
	sess, err := session.NewSession(&aws.Config{
		Region:   aws.String("us-east-1"),
		LogLevel: aws.LogLevel(aws.LogDebugWithHTTPBody),
	})
	if err != nil {
		panic(err)
	}

	svc := s3.New(sess)
	
	out, err := svc.ListObjectsV2WithContext(ctx, &s3.ListObjectsV2Input{
		Bucket: aws.String("foo-bucket-REDACTED"),
	})
	if err != nil {
		panic(err)
	}

	fmt.Println(len(out.Contents))
}

This prints fine:

2023/10/30 09:28:07 
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult
	xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
	<Name>foo-bucket-REDACTED</Name>
	<Prefix></Prefix>
	<KeyCount>1</KeyCount>
	<MaxKeys>1000</MaxKeys>
	<IsTruncated>false</IsTruncated>
	<Contents>
		<Key>2018_POSTER%10_PRE.jpg</Key>
		<LastModified>2023-10-30T16:26:46.000Z</LastModified>
		<ETag>REDACTED</ETag>
		<Size>59015</Size>
		<StorageClass>STANDARD</StorageClass>
	</Contents>
</ListBucketResult>

Please note. S3's object naming rules specifically lists % as a character that should be avoided because of the need to use special handling. Something you can try is pass in Encoding type in the argument
Another thing you can try, is to use the EncodingType : aws.String("url") parameter to see if this alleviate the issue.

I also have noticed you are using an older SDK version. Can you try this either the newest version, or using Go SDK v2 altogether?

Thanks,
Ran~

It looks like the file was created (not by me!) with a 0x10 character in the name. This is pretty easy to reproduce:

_, err := s3c.PutObject(&s3.PutObjectInput{
	Bucket:      aws.String(myBucket),
	Key:         aws.String("_test/foo\x10bar.txt"),
	ContentType: aws.String("text/plain"),
	Body:        strings.NewReader("hello world"),
})
if err != nil {
	panic(err)
}
_, err = s3c.ListObjectsV2(&s3.ListObjectsV2Input{
	Bucket: aws.String(myBucket),
	Prefix: aws.String("_test/"),
})
if err != nil {
	panic(err) // XML syntax error on line 9: illegal character code U+0010
}

I understand this is totally against file naming recommendations (again, not by me!), but what do I do now that I'm in this situation? I need to reliably iterate over this bucket's contents in Golang!

Hi @NathanBaulch ,

Ah now I see what is happening. In Go, the \x is an escape sequence meant to specify byte values in hexadecimal notation. So when the Go interpreter tries to read the XML, it runs into \x10 and it assumes that its the 0x10 ASCII DLE Character which cannot be represented in text.

You can get around it by specifying that you want to get url encoded data:

out, err := svc.ListObjectsV2WithContext(ctx, &s3.ListObjectsV2Input{
	Bucket:       aws.String(myBucket),
	EncodingType: aws.String("url"),
	Prefix:       aws.String("_test/"),
})
if err != nil {
	panic(err)
}

Thanks,
Ran~

Perfect, thanks.

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.