Why is the code of L2 metrics changed from ST-P3?

Question

Why is the code of L2 metrics changed from ST-P3?

buaazeus opened this issue 9 months ago · comments

I noticed the L2 metrics is changed from ST-P3, can you please explain why?
Thank you.

def compute_L2(self, trajs, gt_trajs):
    '''
    trajs: torch.Tensor (n_future, 2)
    gt_trajs: torch.Tensor (n_future, 2)
    '''
    # return torch.sqrt(((trajs[:, :, :2] - gt_trajs[:, :, :2]) ** 2).sum(dim=-1))
    pred_len = trajs.shape[0]
    ade = float(
        sum(
            torch.sqrt(
                (trajs[i, 0] - gt_trajs[i, 0]) ** 2
                + (trajs[i, 1] - gt_trajs[i, 1]) ** 2
            )
            for i in range(pred_len)
        )
        / pred_len
    )
    
    return ade

# def update(self, trajs, gt_trajs, segmentation):
#     '''
#     trajs: torch.Tensor (B, n_future, 3)
#     gt_trajs: torch.Tensor (B, n_future, 3)
#     segmentation: torch.Tensor (B, n_future, 200, 200)
#     '''
#     assert trajs.shape == gt_trajs.shape
#     L2 = self.compute_L2(trajs, gt_trajs)
#     obj_coll_sum, obj_box_coll_sum = self.evaluate_coll(trajs[:,:,:2], gt_trajs[:,:,:2], segmentation)

#     if torch.isnan(L2).max().item():
#         debug = 1
#     else:
#         self.obj_col += obj_coll_sum
#         self.obj_box_col += obj_box_coll_sum
#         self.L2 += L2.sum(dim=0)
#         if torch.isnan(self.L2).max().item():
#             debug=1
#         self.total +=len(trajs)


# def compute(self):
#     return {
#         'obj_col': self.obj_col / self.total,
#         'obj_box_col': self.obj_box_col / self.total,
#         'L2' : self.L2 / self.total
#     }

Bo Jiang · Answer 1 · Thu Nov 02 2023 15:52:34 GMT+0800 (China Standard Time)

We do not use the L2 metric code from ST-P3, it is from our other projects. But the results should be the same. We calculate the average in the 'compute_L2' function, while ST-P3 calculates the average when returning the metric results.

zeus · Answer 2 · Thu Nov 02 2023 17:04:14 GMT+0800 (China Standard Time)

I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s.
The avg code in ST-P3 is computing for the batch, not time.
Can you please double confirm?

Junjie Ye · Answer 3 · Sat Nov 04 2023 03:36:08 GMT+0800 (China Standard Time)

I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?

I also noticed this problem. Could you double-check this?

Bo Jiang · Answer 4 · Mon Nov 27 2023 11:33:52 GMT+0800 (China Standard Time)

I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?

In metric.py of ST-P3, the compute function is performing average on the batch, and ST-P3 also computes the average on the time dimension as the result in evaluate.py:

if cfg.PLANNING.ENABLED:
    for i in range(future_second):
        scores = metric_planning_val[i].compute()
        for key, value in scores.items():
            results['plan_'+key+'_{}s'.format(i+1)]=value.mean()

VAD follows this setting for a fair comparison, only that VAD performs the average on the time dimension in the compute_L2 function, and ST-P3 performs the average on the time dimension when calculating the metric results.

zeus · Answer 5 · Tue Nov 28 2023 10:47:11 GMT+0800 (China Standard Time)

Got it.
Thanks for the reply.