Downloader doesn't follow PartSize parameter
EdoMan000 opened this issue · comments
Describe the bug
When using NewDownloader or NewDownloaderWithClient to create a downloader with a specified PartSize, the parameter setting seems to be ignored.
Even not specifying the parameter does not create 5 MB chunks (which should be the default).
The resulting chunks alternate between 16384 and 1024 bytes (going as low as 569 bytes).
I am also curious why the size alternates between these values, without being consistent.
Btw the upload counterpart works flawlessly with PartSize set.
Expected Behavior
The Downloader created with a specified PartSize (i.e. 10 * 1024 * 1024 --> 10MB) should download chunks with the specified consistent size of 10MB.
Current Behavior
output of the proof of concept is:
...
downloaded chunk of 569 bytes
downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
...
Reproduction Steps
Code of the poc where the issue is visible:
package main
import (
"fmt"
"os"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
type DownloadStream struct {
array []byte
localFile *os.File
}
func main() {
SendFromS3("forest.jpg") // INSERT FILENAME HERE
}
func SendFromS3(fileName string) error {
sess := getSession()
downloader := s3manager.NewDownloaderWithClient(
s3.New(sess),
func(d *s3manager.Downloader) {
d.PartSize = 1024 * 1024 * 10 // 10MB FOR EACH CHUNK
d.Concurrency = 1 // SAME PROBLEM WITH OTHER VALUES
},
)
// SAME ISSUE HERE:
// downloader := s3manager.NewDownloader(
// sess,
// func(d *s3manager.Downloader) {
// d.PartSize = 1024 * 1024 * 10 // 10MB FOR EACH CHUNK
// d.Concurrency = 1 // SAME PROBLEM WITH OTHER VALUES
// },
// )
localFile, _ := os.Create(fileName)
_, err := downloader.Download(
&DownloadStream{[]byte{}, localFile},
&s3.GetObjectInput{
Bucket: aws.String("sdcc-project.2023"), //INSERT AWS BUCKET NAME HERE
Key: aws.String(fileName),
},
)
if err != nil {
return err
}
return nil
}
func getSession() *session.Session {
sess := session.Must(session.NewSession(&aws.Config{
Region: aws.String("us-east-1"), // INSERT AWS REGION HERE
Credentials: credentials.NewSharedCredentials("./credentials", ""), // INSERT PATH OF AWS CREDENTIALS HERE
}))
return sess
}
func (downloadStream *DownloadStream) WriteAt(source []byte, off int64) (bytesSent int, err error) {
downloadStream.array = append(downloadStream.array, source...)
fmt.Printf("downloaded chunk of %d bytes\r\n", len(source))
downloadStream.localFile.Write(source)
return len(source), nil
}
Possible Solution
No response
Additional Information/Context
No response
SDK version used
the issue is reproducible on both the sdk versions
Environment details (Version of Go (go version
)? OS name and version, etc.)
go 1.19
Hi @EdoMan000,
Thanks for reaching out.
I was able to reproduce this issue.
This only happens when using streaming to write the chunk.
When writing the parts directly to file, I dont see this behavior:
package main
import (
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
"os"
)
func main() {
sess, err := session.NewSession(&aws.Config{
Region: aws.String("us-east-1"),
LogLevel: aws.LogLevel(aws.LogDebugWithHTTPBody),
})
if err != nil {
panic(err)
}
downloader := s3manager.NewDownloaderWithClient(
s3.New(sess),
func(d *s3manager.Downloader) {
d.PartSize = 10 * 1024 * 1024
d.Concurrency = 1
},
)
file, err := os.Create("downloaded_file")
if err != nil {
fmt.Println("Failed to create file,", err)
return
}
defer file.Close()
_, err = downloader.Download(
file,
&s3.GetObjectInput{
Bucket: aws.String("foo-bucket"),
Key: aws.String("large-file"),
},
)
if err != nil {
fmt.Println("Error downloading file:", err)
}
}
Not working:
type DownloadStream struct {
localFile *os.File
}
func main() {
sess, err := session.NewSession(&aws.Config{
Region: aws.String("us-east-1"),
LogLevel: aws.LogLevel(aws.LogDebugWithHTTPBody),
})
if err != nil {
log.Fatalf("Failed to create session: %v", err)
}
downloader := s3manager.NewDownloaderWithClient(
s3.New(sess),
func(d *s3manager.Downloader) {
d.PartSize = 10 * 1024 * 1024
},
)
fileName := "your-file-name"
localFile, err := os.Create(fileName)
if err != nil {
log.Fatalf("Error creating file: %v", err)
}
defer localFile.Close()
ds := &DownloadStream{localFile: localFile}
_, err = downloader.Download(
ds,
&s3.GetObjectInput{
Bucket: aws.String("foo-bucket"),
Key: aws.String("large-file"),
},
)
if err != nil {
log.Fatalf("Error downloading file: %v", err)
}
log.Println("Download completed.")
}
func (downloadStream *DownloadStream) WriteAt(p []byte, off int64) (n int, err error) {
log.Printf("Attempting to download chunk at offset %d", off)
n, err = downloadStream.localFile.WriteAt(p, off)
if err != nil {
log.Printf("Failed to write chunk at offset %d: %v", off, err)
return 0, err
}
log.Printf("Successfully downloaded and wrote %d bytes at offset %d", len(p), off)
return n, nil
}
/*
2023/10/10 11:48:15 Attempting to download chunk at offset 0
2023/10/10 11:48:15 Successfully downloaded and wrote 7541 bytes at offset 0
2023/10/10 11:48:15 Attempting to download chunk at offset 7541
2023/10/10 11:48:15 Successfully downloaded and wrote 16384 bytes at offset 7541
2023/10/10 11:48:15 Attempting to download chunk at offset 23925
2023/10/10 11:48:15 Successfully downloaded and wrote 1024 bytes at offset 23925
2023/10/10 11:48:15 Attempting to download chunk at offset 24949
2023/10/10 11:48:15 Successfully downloaded and wrote 16384 bytes at offset 24949
2023/10/10 11:48:15 Attempting to download chunk at offset 41333
2023/10/10 11:48:15 Successfully downloaded and wrote 1024 bytes at offset 41333
2023/10/10 11:48:15 Attempting to download chunk at offset 42357
2023/10/10 11:48:15 Successfully downloaded and wrote 10992 bytes at offset 42357
2023/10/10 11:48:15 Attempting to download chunk at offset 53349
2023/10/10 11:48:15 Successfully downloaded and wrote 9000 bytes at offset 53349
2023/10/10 11:48:15 Attempting to download chunk at offset 62349
2023/10/10 11:48:15 Successfully downloaded and wrote 16384 bytes at offset 62349
2023/10/10 11:48:15 Attempting to download chunk at offset 78733
2023/10/10 11:48:15 Successfully downloaded and wrote 1024 bytes at offset 78733
2023/10/10 11:48:15 Attempting to download chunk at offset 79757
2023/10/10 11:48:15 Successfully downloaded and wrote 9592 bytes at offset 79757
2023/10/10 11:48:16 Attempting to download chunk at offset 89349
2023/10/10 11:48:16 Successfully downloaded and wrote 16384 bytes at offset 89349
2023/10/10 11:48:16 Attempting to download chunk at offset 105733
2023/10/10 11:48:16 Successfully downloaded and wrote 1024 bytes at offset 105733
2023/10/10 11:48:16 Attempting to download chunk at offset 106757
2023/10/10 11:48:16 Successfully downloaded and wrote 9592 bytes at offset 106757
2023/10/10 11:48:16 Attempting to download chunk at offset 116349
*/
Will discuss this with the team for further investigation.
Thanks,
Ran~
Hi @RanVaknin,
Thank you for addressing this issue and trying to help.
Unfortunately my team and I noticed the problem because in our project each chunk is individually forwarded over Go channels to other components, making streaming mode essential.
Looking forward to hearing from you soon, I wish you the best and hope you can find a solution.
Edoardo
@EdoMan000 --
You're looking at the size of the data passed to WriteAt
, not the requested/downloaded PartSize
.
This is one of the download requests from your example above but with debug response logging enabled:
---[ REQUEST POST-SIGN ]-----------------------------
GET /temp_64MB_file HTTP/1.1^M
Host: foo5.s3.amazonaws.com^M
User-Agent: aws-sdk-go/1.45.24 (go1.21.0; darwin; amd64) S3Manager^M
Authorization: [...]
Range: bytes=10485760-20971519^M
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855^M
X-Amz-Date: 20231012T180237Z^M
X-Amz-Security-Token: [...]
^M
-----------------------------------------------------
2023/10/12 14:02:37 DEBUG: Response s3/GetObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 206 Partial Content^M
Content-Length: 10485760^M
Accept-Ranges: bytes^M
Content-Range: bytes 10485760-20971519/67108864^M
Content-Type: binary/octet-stream^M
Date: Thu, 12 Oct 2023 18:02:38 GMT^M
Etag: "c92a5974ba98ef088a74f4b645d68aa9-4"^M
Last-Modified: Thu, 12 Oct 2023 17:57:32 GMT^M
Server: AmazonS3^M
X-Amz-Id-2: 6xkhjaaj2DlCwO7w8ZN58h5MTnZOerU05B+I2o9ht4upFZlYzVcWdzsViDNcM1fJQT5+KTQkPbA=^M
X-Amz-Request-Id: CTRBE7PVYRM9BNZT^M
X-Amz-Server-Side-Encryption: AES256^M
^M
Note that the range headers indicate we've asked for and received the configured 10M slice.
WriteAt
is called multiple times per downloaded chunk e.g.
downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
downloaded chunk of 16384 bytes
downloaded chunk of 1024 bytes
downloaded chunk of 6839 bytes
The phrase "downloaded chunk" is inaccurate in the sense that you're logging the size of what's written, not what was downloaded.
If you directly wrap an *os.File
's WriteAt
you will observe the same thing.
There's no guarantee that the implementation will write in chunks of length equal to that configuration, nor would you likely want it to by default - that would mean allocating a 10M write buffer, in your example.
Is there a reason that you need the sizes to be 1:1?
@lucix-aws thank you for your response.
Yes actually these days the doubt that it was actually downloading the 10MB and the smaller parts were the ones passed to WriteAt had occurred to me. However it was not clear in the documentation that there was no such guarantee.
Yes for us it would be important for there to be a 1:1 match to have acceptable chunk processing times (since we add latency at each WriteAt).
For this reason we were wondering if there was a way to force such behaviour even if that would mean allocating bigger buffers each time.
It's not possible in the downloader today nor is it something I imagine we'd support.
You'll need to wrap a delegate WriterAt
to buffer and flush slices of the desired size accordingly. Note that you'll have to know the size of the object beforehand (most likely via HeadObject
) since the size of the last chunk can be different (object size % PartSize).
Closing this for now.
⚠️ COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.