Issue on bounding box coordinates
tibetgao opened this issue · comments
Hi there,
As I have observed from the annotations, I found that some values in bbox coordinates might exceed the limitation of the image size (usually 640*480),e.g.:
'\nWhat are the two people[[200, 251, 447, 963], [529, 246, 744, 984]] doing in the image?\nAnswer the question with scene graph.'
I am wondering if there is any extra operation that needs to be done (e.g. normalization)
Cheers!
Thank you for your interest in our project.
All bounding boxes are normalized to integer values within the range [0, 1000). The code shown below demonstrates this process:
height = image.height
width = image.width
bbox = [x1, y1, x2, y2]
BOX_SCALE = 999
if SQUARE_PAD:
if height == width:
pass
elif height < width:
delta = (width - height) // 2
bbox[1] += delta
bbox[3] += delta
else:
delta = (height - width) // 2
bbox[0] += delta
bbox[2] += delta
bbox = [
int(bbox[0] / max(height, width) * BOX_SCALE),
int(bbox[1] / max(height, width) * BOX_SCALE),
int(bbox[2] / max(height, width) * BOX_SCALE),
int(bbox[3] / max(height, width) * BOX_SCALE),
]
else:
bbox = [
int(bbox[0] / width * BOX_SCALE),
int(bbox[1] / height * BOX_SCALE),
int(bbox[2] / width * BOX_SCALE),
int(bbox[3] / height * BOX_SCALE),
]
Note that when SQUARE_PAD
is set to True
, the image will be padded to form a square.
You can refer to this script for more details about how to visualize these boxes.