aras62 / PIEPredict

PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bounding box normalization

stratomaster31 opened this issue · comments

Hi there,

I'm playing around with your repo, which is very well written actually. I'm trying to predict pedestrian intention (performing inference) in my own sequences, so I'm not using the pie_data.py... To check my pipe-line, I'm firstly predicting on the extraced images from the PIE_clips but I'm getting weird results (a prediction over 0.9 for every sequence)... I'm following your code for the pre-processing stages (jitter, squarify, crop and im_pad, vgg16.preprocess) so I think the problem shouldn't be there... (I'm assuming images are in RGB format, not BGR)

My problems might be with the bounding boxes:

  1. Which bboxes/locations should be fed? original locations or expanded locations (with the 2x factor for local context)?
  2. How the bounding box normalization is done?, I'm proceeding:
bbox[0]/=image_w
bbox[1]/=image_h
bbox[2]/=image_w
bbox[3]/=image_h

where bbox =[x_0, y_0, x_1, y_1] the top-left and botton-right corners of the bbox.

I could perform a pull request with the inferece code, which might be of great help (if it works) for performing easy inference for a given sequence of images

Thank you very much!

Hi There,

Thanks for using our dataset. To answer your questions:

1- For the trajectory stream of the code we used the bounding box coordinates which only contain the pedestrians because no visual features were used. We used the context bboxes only for intention stream.

2- Normalization was done by subtracting the first bbox coordinates in sequences from the rest of bboxes in the given sequence.
e.g.
seq = [bbox_0, bbox_1, bbox_2, ...., bbox_n]
bbox_1_norm = bbox_1 - bbox_0
bbox_2_norm = bbox_2 - bbox_0
....
seq_norm = [bbox_1_norm, ...., bbox_n_norm]

You can consider the normalization as converting the coordinates to velocity. Since the first bbox is always zero, it is omitted.

I hope my answer helps,

Oh, thanks, it is of great help! So, for a sequence of 15 detections, only 14 bboxes are fed to the PIEIntent decoder (omitting the bbox0)? The problem is that model.h5 for PIEIntent expects a decoder_input of shape (15,4)

Correct

The problem is that model.h5 for PIEIntent expects a decoder_input of shape (15,4)

Normalization is only done for trajectory module. O_intent gets the coordinates as is so it is 15,4

Ok then, then for PIEIntent no normalization is performed?

correct

Thank you very much for your responses and your great work!