Explore using S3 Transfer Manager
cobookman opened this issue · comments
AWS SDK provides a S3 Transfer Manager that handles s3 copy to memory operations. Looks like the library is only making use of the TransferManager when downloading a multi-part object..
This library seems to use just 1 thread for downloading non-multipart objects. And if the object is large enough, having multiple threads downloading different chunks of the object would improve performance. It might pay to have both multipart and non-multipart objects use the TransferManager.
We're upstreaming the amazon-s3-plugin-for-pytorch into the torchdata package (pytorch/data#165).
We're dropping support for this plugin.
For you question, I believe the S3 Transfer Manager or multi-part downloaing is always use, unless directed otherwise:
amazon-s3-plugin-for-pytorch/awsio/csrc/io/s3/s3_io.cpp
Lines 258 to 266 in ed89987