segmentation fault on Plugin tests

Question

segmentation fault on Plugin tests

mahb324 opened this issue 3 years ago · comments

Hi,
I am trying to build and install the plugin from git source on EC2 (with RHEL 8).
when I do the smoke, or any test as soon as I hit the _pywrap_s3_io.s3Init() a segmentation fault is thrown.
do you have any suggestion ? does plugin works on RHEL?

Roshani Nagmote · Answer 1 · Thu Sep 16 2021 02:06:24 GMT+0800 (China Standard Time)

Can you use containers for the plugin?

It is available through following containers:
763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:1.9.0-cpu-py38-ubuntu20.04-v1.1

763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:1.9.0-gpu-py38-cu111-ubuntu20.04-v1.1

763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:1.8.1-cpu-py36-ubuntu18.04-v1.6

763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:1.8.1-gpu-py36-cu111-ubuntu18.04-v1.7

Mahb · Answer 2 · Sat Sep 18 2021 03:56:19 GMT+0800 (China Standard Time)

I switched over to ubuntu and the segmentation fault went away, I guess you might not supporting RHEL, is that true?
I am now trying to run my code using 2.7G data I am using map style dataset and get a very poor performance (~900 sec) to load the data. the network utilization is so low, I tried different number dataloading workers, did not help.
is there a way I can initiate some streaming during the dataset and dataloading , to improve the performance?

Roshani Nagmote · Answer 3 · Sat Sep 18 2021 06:39:42 GMT+0800 (China Standard Time)

How is your data distributed in S3 bucket?

For datasets consisting of a large number of smaller objects, accessing each object individually can be inefficient. For such datasets, it is recommended to create shards of the training data(50-200MB) and use S3IterableDataset for better performance.

Daiming Yang · Answer 4 · Sat Mar 19 2022 08:49:22 GMT+0800 (China Standard Time)

@mahb324

We're upstreaming the amazon-s3-plugin-for-pytorch into the torchdata package (pytorch/data#165).
We're dropping support for this plugin.

Closing this issue as there's no further request from user. Feel free to re-open if necessary.