aws / aws-sdk-go

AWS SDK for the Go programming language.

Home Page:http://aws.amazon.com/sdk-for-go/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Downloader doesn't follow PartSize parameter

EdoMan000 opened this issue · comments

Describe the bug

When using NewDownloader or NewDownloaderWithClient to create a downloader with a specified PartSize, the parameter setting seems to be ignored.
Even not specifying the parameter does not create 5 MB chunks (which should be the default).
The resulting chunks alternate between 16384 and 1024 bytes (going as low as 569 bytes).

I am also curious why the size alternates between these values, without being consistent.
Btw the upload counterpart works flawlessly with PartSize set.

Expected Behavior

The Downloader created with a specified PartSize (i.e. 10 * 1024 * 1024 --> 10MB) should download chunks with the specified consistent size of 10MB.

Current Behavior

output of the proof of concept is:

...
downloaded chunk of 569 bytes
downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
...

Reproduction Steps

Code of the poc where the issue is visible:

package main

import (
	"fmt"
	"os"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/credentials"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
	"github.com/aws/aws-sdk-go/service/s3/s3manager"
)

type DownloadStream struct {
	array     []byte
	localFile *os.File
}

func main() {
	SendFromS3("forest.jpg") // INSERT FILENAME HERE
}

func SendFromS3(fileName string) error {
	sess := getSession()
	downloader := s3manager.NewDownloaderWithClient(
		s3.New(sess),
		func(d *s3manager.Downloader) {
			d.PartSize = 1024 * 1024 * 10 // 10MB FOR EACH CHUNK
			d.Concurrency = 1             // SAME PROBLEM WITH OTHER VALUES
		},
	)
	// SAME ISSUE HERE:
	// downloader := s3manager.NewDownloader(
	// 	sess,
	// 	func(d *s3manager.Downloader) {
	// 		d.PartSize = 1024 * 1024 * 10 // 10MB FOR EACH CHUNK
	// 		d.Concurrency = 1             // SAME PROBLEM WITH OTHER VALUES
	// 	},
	// )
	localFile, _ := os.Create(fileName)
	_, err := downloader.Download(
		&DownloadStream{[]byte{}, localFile},
		&s3.GetObjectInput{
			Bucket: aws.String("sdcc-project.2023"), //INSERT AWS BUCKET NAME HERE
			Key:    aws.String(fileName),
		},
	)
	if err != nil {
		return err
	}

	return nil
}

func getSession() *session.Session {
	sess := session.Must(session.NewSession(&aws.Config{
		Region:      aws.String("us-east-1"),                               // INSERT AWS REGION HERE
		Credentials: credentials.NewSharedCredentials("./credentials", ""), // INSERT PATH OF AWS CREDENTIALS HERE
	}))
	return sess
}

func (downloadStream *DownloadStream) WriteAt(source []byte, off int64) (bytesSent int, err error) {
	downloadStream.array = append(downloadStream.array, source...)
	fmt.Printf("downloaded chunk of %d bytes\r\n", len(source))
	downloadStream.localFile.Write(source)
	return len(source), nil
}

Possible Solution

No response

Additional Information/Context

No response

SDK version used

the issue is reproducible on both the sdk versions

Environment details (Version of Go (go version)? OS name and version, etc.)

go 1.19

Hi @EdoMan000,

Thanks for reaching out.

I was able to reproduce this issue.

This only happens when using streaming to write the chunk.

When writing the parts directly to file, I dont see this behavior:

package main

import (
	"fmt"
	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
	"github.com/aws/aws-sdk-go/service/s3/s3manager"
	"os"
)

func main() {
	sess, err := session.NewSession(&aws.Config{
		Region:   aws.String("us-east-1"),
		LogLevel: aws.LogLevel(aws.LogDebugWithHTTPBody),
	})
	if err != nil {
		panic(err)
	}

	downloader := s3manager.NewDownloaderWithClient(
		s3.New(sess),
		func(d *s3manager.Downloader) {
			d.PartSize = 10 * 1024 * 1024
			d.Concurrency = 1
		},
	)

	file, err := os.Create("downloaded_file")
	if err != nil {
		fmt.Println("Failed to create file,", err)
		return
	}
	defer file.Close()

	_, err = downloader.Download(
		file,
		&s3.GetObjectInput{
			Bucket: aws.String("foo-bucket"),
			Key:    aws.String("large-file"),
		},
	)
	if err != nil {
		fmt.Println("Error downloading file:", err)
	}
}

Not working:

type DownloadStream struct {
	localFile *os.File
}

func main() {
	sess, err := session.NewSession(&aws.Config{
		Region:   aws.String("us-east-1"),
		LogLevel: aws.LogLevel(aws.LogDebugWithHTTPBody),
	})
	if err != nil {
		log.Fatalf("Failed to create session: %v", err)
	}

	downloader := s3manager.NewDownloaderWithClient(
		s3.New(sess),
		func(d *s3manager.Downloader) {
			d.PartSize = 10 * 1024 * 1024
		},
	)

	fileName := "your-file-name"
	localFile, err := os.Create(fileName)
	if err != nil {
		log.Fatalf("Error creating file: %v", err)
	}
	defer localFile.Close()

	ds := &DownloadStream{localFile: localFile}

	_, err = downloader.Download(
		ds,
		&s3.GetObjectInput{
			Bucket: aws.String("foo-bucket"),
			Key:    aws.String("large-file"),
		},
	)
	if err != nil {
		log.Fatalf("Error downloading file: %v", err)
	}

	log.Println("Download completed.")
}

func (downloadStream *DownloadStream) WriteAt(p []byte, off int64) (n int, err error) {
	log.Printf("Attempting to download chunk at offset %d", off)

	n, err = downloadStream.localFile.WriteAt(p, off)
	if err != nil {
		log.Printf("Failed to write chunk at offset %d: %v", off, err)
		return 0, err
	}

	log.Printf("Successfully downloaded and wrote %d bytes at offset %d", len(p), off)

	return n, nil
}
/*
2023/10/10 11:48:15 Attempting to download chunk at offset 0
2023/10/10 11:48:15 Successfully downloaded and wrote 7541 bytes at offset 0
2023/10/10 11:48:15 Attempting to download chunk at offset 7541
2023/10/10 11:48:15 Successfully downloaded and wrote 16384 bytes at offset 7541
2023/10/10 11:48:15 Attempting to download chunk at offset 23925
2023/10/10 11:48:15 Successfully downloaded and wrote 1024 bytes at offset 23925
2023/10/10 11:48:15 Attempting to download chunk at offset 24949
2023/10/10 11:48:15 Successfully downloaded and wrote 16384 bytes at offset 24949
2023/10/10 11:48:15 Attempting to download chunk at offset 41333
2023/10/10 11:48:15 Successfully downloaded and wrote 1024 bytes at offset 41333
2023/10/10 11:48:15 Attempting to download chunk at offset 42357
2023/10/10 11:48:15 Successfully downloaded and wrote 10992 bytes at offset 42357
2023/10/10 11:48:15 Attempting to download chunk at offset 53349
2023/10/10 11:48:15 Successfully downloaded and wrote 9000 bytes at offset 53349
2023/10/10 11:48:15 Attempting to download chunk at offset 62349
2023/10/10 11:48:15 Successfully downloaded and wrote 16384 bytes at offset 62349
2023/10/10 11:48:15 Attempting to download chunk at offset 78733
2023/10/10 11:48:15 Successfully downloaded and wrote 1024 bytes at offset 78733
2023/10/10 11:48:15 Attempting to download chunk at offset 79757
2023/10/10 11:48:15 Successfully downloaded and wrote 9592 bytes at offset 79757
2023/10/10 11:48:16 Attempting to download chunk at offset 89349
2023/10/10 11:48:16 Successfully downloaded and wrote 16384 bytes at offset 89349
2023/10/10 11:48:16 Attempting to download chunk at offset 105733
2023/10/10 11:48:16 Successfully downloaded and wrote 1024 bytes at offset 105733
2023/10/10 11:48:16 Attempting to download chunk at offset 106757
2023/10/10 11:48:16 Successfully downloaded and wrote 9592 bytes at offset 106757
2023/10/10 11:48:16 Attempting to download chunk at offset 116349
*/

Will discuss this with the team for further investigation.
Thanks,
Ran~

Hi @RanVaknin,

Thank you for addressing this issue and trying to help.
Unfortunately my team and I noticed the problem because in our project each chunk is individually forwarded over Go channels to other components, making streaming mode essential.

Looking forward to hearing from you soon, I wish you the best and hope you can find a solution.
Edoardo

@EdoMan000 --

You're looking at the size of the data passed to WriteAt, not the requested/downloaded PartSize.

This is one of the download requests from your example above but with debug response logging enabled:

---[ REQUEST POST-SIGN ]-----------------------------
GET /temp_64MB_file HTTP/1.1^M
Host: foo5.s3.amazonaws.com^M
User-Agent: aws-sdk-go/1.45.24 (go1.21.0; darwin; amd64) S3Manager^M
Authorization: [...]
Range: bytes=10485760-20971519^M
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855^M
X-Amz-Date: 20231012T180237Z^M
X-Amz-Security-Token: [...]
^M

-----------------------------------------------------
2023/10/12 14:02:37 DEBUG: Response s3/GetObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 206 Partial Content^M
Content-Length: 10485760^M
Accept-Ranges: bytes^M
Content-Range: bytes 10485760-20971519/67108864^M
Content-Type: binary/octet-stream^M
Date: Thu, 12 Oct 2023 18:02:38 GMT^M
Etag: "c92a5974ba98ef088a74f4b645d68aa9-4"^M
Last-Modified: Thu, 12 Oct 2023 17:57:32 GMT^M
Server: AmazonS3^M
X-Amz-Id-2: 6xkhjaaj2DlCwO7w8ZN58h5MTnZOerU05B+I2o9ht4upFZlYzVcWdzsViDNcM1fJQT5+KTQkPbA=^M
X-Amz-Request-Id: CTRBE7PVYRM9BNZT^M
X-Amz-Server-Side-Encryption: AES256^M
^M

Note that the range headers indicate we've asked for and received the configured 10M slice.

WriteAt is called multiple times per downloaded chunk e.g.

downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
downloaded chunk of 6839 bytes

The phrase "downloaded chunk" is inaccurate in the sense that you're logging the size of what's written, not what was downloaded.

If you directly wrap an *os.File's WriteAt you will observe the same thing.

There's no guarantee that the implementation will write in chunks of length equal to that configuration, nor would you likely want it to by default - that would mean allocating a 10M write buffer, in your example.

Is there a reason that you need the sizes to be 1:1?

@lucix-aws thank you for your response.

Yes actually these days the doubt that it was actually downloading the 10MB and the smaller parts were the ones passed to WriteAt had occurred to me. However it was not clear in the documentation that there was no such guarantee.

Yes for us it would be important for there to be a 1:1 match to have acceptable chunk processing times (since we add latency at each WriteAt).

For this reason we were wondering if there was a way to force such behaviour even if that would mean allocating bigger buffers each time.

It's not possible in the downloader today nor is it something I imagine we'd support.

You'll need to wrap a delegate WriterAt to buffer and flush slices of the desired size accordingly. Note that you'll have to know the size of the object beforehand (most likely via HeadObject) since the size of the last chunk can be different (object size % PartSize).

Closing this for now.

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.