Issue on bounding box coordinates

Question

Issue on bounding box coordinates

tibetgao opened this issue 4 months ago · comments

Hi there,

As I have observed from the annotations, I found that some values in bbox coordinates might exceed the limitation of the image size (usually 640*480),e.g.:
'\nWhat are the two people[[200, 251, 447, 963], [529, 246, 744, 984]] doing in the image?\nAnswer the question with scene graph.'

I am wondering if there is any extra operation that needs to be done (e.g. normalization)

Cheers!

WeiyunWang · Answer 1 · Mon Mar 11 2024 22:20:29 GMT+0800 (China Standard Time)

Thank you for your interest in our project.

All bounding boxes are normalized to integer values within the range [0, 1000). The code shown below demonstrates this process:

height = image.height
width = image.width
bbox = [x1, y1, x2, y2]
BOX_SCALE = 999

if SQUARE_PAD:
    if height == width:
        pass
    elif height < width:
        delta = (width - height) // 2
        bbox[1] += delta
        bbox[3] += delta
    else:
        delta = (height - width) // 2
        bbox[0] += delta
        bbox[2] += delta

    bbox = [
        int(bbox[0] / max(height, width) * BOX_SCALE),
        int(bbox[1] / max(height, width) * BOX_SCALE),
        int(bbox[2] / max(height, width) * BOX_SCALE),
        int(bbox[3] / max(height, width) * BOX_SCALE),
    ]
else:
    bbox = [
        int(bbox[0] / width * BOX_SCALE),
        int(bbox[1] / height * BOX_SCALE),
        int(bbox[2] / width * BOX_SCALE),
        int(bbox[3] / height * BOX_SCALE),
    ]

Note that when SQUARE_PAD is set to True, the image will be padded to form a square.

You can refer to this script for more details about how to visualize these boxes.