Bounding box normalization

Question

Bounding box normalization

stratomaster31 opened this issue 4 years ago · comments

Hi there,

I'm playing around with your repo, which is very well written actually. I'm trying to predict pedestrian intention (performing inference) in my own sequences, so I'm not using the pie_data.py... To check my pipe-line, I'm firstly predicting on the extraced images from the PIE_clips but I'm getting weird results (a prediction over 0.9 for every sequence)... I'm following your code for the pre-processing stages (jitter, squarify, crop and im_pad, vgg16.preprocess) so I think the problem shouldn't be there... (I'm assuming images are in RGB format, not BGR)

My problems might be with the bounding boxes:

Which bboxes/locations should be fed? original locations or expanded locations (with the 2x factor for local context)?
How the bounding box normalization is done?, I'm proceeding:

bbox[0]/=image_w
bbox[1]/=image_h
bbox[2]/=image_w
bbox[3]/=image_h

where bbox =[x_0, y_0, x_1, y_1] the top-left and botton-right corners of the bbox.

I could perform a pull request with the inferece code, which might be of great help (if it works) for performing easy inference for a given sequence of images

Thank you very much!

Amir Rasouli commented 4 years ago

Correct

Amir Rasouli commented 4 years ago

correct

Amir Rasouli · Answer 1 · Wed Jun 03 2020 04:20:25 GMT+0800 (China Standard Time)

Hi There,

Thanks for using our dataset. To answer your questions:

1- For the trajectory stream of the code we used the bounding box coordinates which only contain the pedestrians because no visual features were used. We used the context bboxes only for intention stream.

2- Normalization was done by subtracting the first bbox coordinates in sequences from the rest of bboxes in the given sequence.
e.g.
seq = [bbox_0, bbox_1, bbox_2, ...., bbox_n]
bbox_1_norm = bbox_1 - bbox_0
bbox_2_norm = bbox_2 - bbox_0
....
seq_norm = [bbox_1_norm, ...., bbox_n_norm]

You can consider the normalization as converting the coordinates to velocity. Since the first bbox is always zero, it is omitted.

I hope my answer helps,

stratomaster31 · Answer 2 · Wed Jun 03 2020 04:45:53 GMT+0800 (China Standard Time)

Oh, thanks, it is of great help! So, for a sequence of 15 detections, only 14 bboxes are fed to the PIEIntent decoder (omitting the bbox0)? The problem is that model.h5 for PIEIntent expects a decoder_input of shape (15,4)

stratomaster31 · Answer 3 · Wed Jun 03 2020 04:50:29 GMT+0800 (China Standard Time)

The problem is that model.h5 for PIEIntent expects a decoder_input of shape (15,4)

Amir Rasouli · Answer 4 · Wed Jun 03 2020 04:53:04 GMT+0800 (China Standard Time)

Normalization is only done for trajectory module. O_intent gets the coordinates as is so it is 15,4

stratomaster31 · Answer 5 · Wed Jun 03 2020 04:54:08 GMT+0800 (China Standard Time)

Ok then, then for PIEIntent no normalization is performed?

stratomaster31 · Answer 6 · Wed Jun 03 2020 05:02:38 GMT+0800 (China Standard Time)

Thank you very much for your responses and your great work!