openimages / dataset

The Open Images dataset

Home Page:https://storage.googleapis.com/openimages/web/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

(V5) Mismatched image and mask resolutions.

fnw opened this issue · comments

There appear to be several cases where the size of the original image and the size of a segmentation mask belonging to an object in the image are different.

For example, for training image 0cddfe521cf926bf, and mask 0cddfe521cf926bf_m0c9ph5_9ae6e629, the original image size is 768x1024, while the mask is 1200x1600. There are also cases where the image is larger than the mask, such as in the case of image 092e0c44f0bfe7e3 and mask 092e0c44f0bfe7e3_m01yrx_8db3ace7.

I have observed for a (not so) random sample of 1000 images (they were filtered by object classes first) that only about 6% of the images have masks with matching resolutions. I've also observed for this sample that one of the images seems to have a resolution that is a multiple of the other, i.e. the image was resized by a constant factor, preserving the aspect ratio.

Is this expected behavior or is there something wrong with my data? I've download the images from CVDF.

If this is expected behavior, what is the recommended approach to matching the size of the two images? While it seems relatively safe to simply reduce the size of the original image when it is the larger of the two, both upscaling the original image and downscaling the mask seem to be poor ideas for the case where the mask is the larger of the images.

Hello all.

Is this expected behavior or is there something wrong with my data?

Indeed the image masks do not match the released images.
It turns out the images available to download from https://g.co/dataset/open-images (via CVDF or Figure Eight) are a resized version of the original Flickr image.
"The images are rescaled to have at most 1024 pixels on their longest side" says https://github.com/cvdfoundation/open-images-dataset

However the annotations were done over a set of images with a different resizing rule ("at most 1600 pixels on the largest size"). The good news is that thus the annotations are actually higher resolution than the images (the ones available to download, the originals from Flickr might be even larger).

I am surprised that you found one case where it is the reverse, I would expect this to be rare.

what is the recommended approach to matching the size of the two images?

In any case, the aspect ratio of the binary mask images should be the same as the rgb images; thus resizing should be fine.
In general I would recommend to upscale instead of downscale (normally that would mean up-scaling the RGB image to match the masks), so that there is no information loss in the system.

It's true that many segmentations and images are mismatched, but also there are some images which have sizes much larger than 1024px, e.g. 6e4c43968cdada66.jpg and 099082463344a7ad.jpg.

I know that the issue is old, but I found it while looking for answers for my problems and maybe it will help someone else and save them some time. ;)