aws / amazon-s3-plugin-for-pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Explore using S3 Transfer Manager

cobookman opened this issue · comments

AWS SDK provides a S3 Transfer Manager that handles s3 copy to memory operations. Looks like the library is only making use of the TransferManager when downloading a multi-part object..

This library seems to use just 1 thread for downloading non-multipart objects. And if the object is large enough, having multiple threads downloading different chunks of the object would improve performance. It might pay to have both multipart and non-multipart objects use the TransferManager.

return readS3Client(offset, n, buffer);

@cobookman

We're upstreaming the amazon-s3-plugin-for-pytorch into the torchdata package (pytorch/data#165).
We're dropping support for this plugin.

For you question, I believe the S3 Transfer Manager or multi-part downloaing is always use, unless directed otherwise:

multi_part_download_ = true;
const char *multi_download_disable_char =
getenv("S3_DISABLE_MULTI_PART_DOWNLOAD");
if (multi_download_disable_char) {
std::string multi_download_disable_str(multi_download_disable_char);
if (multi_download_disable_str == "ON") {
multi_part_download_ = false;
}
}