OpenGVLab / all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

Home Page:https://huggingface.co/spaces/OpenGVLab/all-seeing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue on bounding box coordinates

tibetgao opened this issue · comments

Hi there,

As I have observed from the annotations, I found that some values in bbox coordinates might exceed the limitation of the image size (usually 640*480),e.g.:
'\nWhat are the two people[[200, 251, 447, 963], [529, 246, 744, 984]] doing in the image?\nAnswer the question with scene graph.'

I am wondering if there is any extra operation that needs to be done (e.g. normalization)

Cheers!

Thank you for your interest in our project.

All bounding boxes are normalized to integer values within the range [0, 1000). The code shown below demonstrates this process:

height = image.height
width = image.width
bbox = [x1, y1, x2, y2]
BOX_SCALE = 999

if SQUARE_PAD:
    if height == width:
        pass
    elif height < width:
        delta = (width - height) // 2
        bbox[1] += delta
        bbox[3] += delta
    else:
        delta = (height - width) // 2
        bbox[0] += delta
        bbox[2] += delta

    bbox = [
        int(bbox[0] / max(height, width) * BOX_SCALE),
        int(bbox[1] / max(height, width) * BOX_SCALE),
        int(bbox[2] / max(height, width) * BOX_SCALE),
        int(bbox[3] / max(height, width) * BOX_SCALE),
    ]
else:
    bbox = [
        int(bbox[0] / width * BOX_SCALE),
        int(bbox[1] / height * BOX_SCALE),
        int(bbox[2] / width * BOX_SCALE),
        int(bbox[3] / height * BOX_SCALE),
    ]

Note that when SQUARE_PAD is set to True, the image will be padded to form a square.

You can refer to this script for more details about how to visualize these boxes.