Reading object metadata

Question

Reading object metadata

rkoo19 opened this issue 3 years ago · comments

I was wondering if is way a way to also fetch an object's metadata when reading the object itself. I am trying to use ImageNet to train an image classification model, similar to what is done in s3_imagenet_example.py, but I am trying to add image class as metadata for the object itself.

Ryan Koo · Answer 1 · Tue Nov 16 2021 01:40:47 GMT+0800 (China Standard Time)

So, if I am to use map-style w/ S3Dataset, I want to be able to fetch the object itself from my S3 bucket, but to also be able to fetch a piece of metadata associated w/ that said object.

Ryan Koo · Answer 2 · Tue Nov 16 2021 01:45:28 GMT+0800 (China Standard Time)

I was reading the class definition for S3Dataset, and I saw that when getting an object, it uses some filename to fetch an object, but does nothing about metadata. I would like to, if there is not already do so, modify the procedure of getting an object from S3 to also fetch metadata associated w/ the object as well. I hope this makes sense, and I would appreciate any help I could get!

Ben Snyder · Answer 3 · Tue Nov 16 2021 10:34:39 GMT+0800 (China Standard Time)

How is the object metadata stored? One possibility might be to use the S3BaseClass to write a custom method of reading the file object from S3, then use the filename to read metadata from some other source. For example, here's the setup I use to read an image and annotations for the COCO dataset.

def _load_image(self, image_id):
        if self.handler == None:
            self.handler = _pywrap_s3_io.S3Init()
        filename = os.path.join(self.root, self.coco.loadImgs(image_id)[0]["file_name"])
        fileobj = self.handler.s3_read(filename)
        return Image.open(io.BytesIO(fileobj)).convert("RGB")
    
def _load_target(self, image_id):
        return self.coco.loadAnns(self.coco.getAnnIds(image_id))
    
def __getitem__(self, idx):
        image_id = self.ids[idx]
        img = self._load_image(image_id)
        anno = self._load_target(image_id)
        target = self.build_target(anno, img.size)
        if self._transforms is not None:
                img, target = self._transforms(img, target)
        return img, target, idx

Daiming Yang · Answer 4 · Sat Mar 19 2022 08:20:47 GMT+0800 (China Standard Time)

@johnbensnyder Thanks for helping on this issue!
@rkoo19

We're upstreaming the amazon-s3-plugin-for-pytorch into the torchdata package (pytorch/data#318).
We're dropping support for this plugin.

The current s3 plugin doesn't have this feature, so do the new S3 IO datapipes. We'll backlog this feature request, and update the feature in the new S3 IO datapipes.