hustvl / VAD

[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving

Home Page:https://arxiv.org/abs/2303.12077

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about L2 computation

wljungbergh opened this issue · comments

Hi, and thank you for your work.

When reviewing your evaluation code I find that your computation of the L2 displacement error (here) is computed using the average displacement error up to and including that particular time. This differs from how previous works (e.g., UniAD and ST-P3) have defined the metric. They instead compute the metric as the L2 norm at that particular timestep (see here and here)

I might have misunderstood your code and if so please let me know... but if not could you provide the numbers using the metric definition used in ST-P3 and UniAD? This would make them more easily comparable.

Can you please shed some light on this? Which of the two definitions is considered correct? (might very well be that UniAD and ST-P3 have defined the metric wrong).

Thanks,

Please refer to this issue.

Thanks a lot for the clarification. I don't know how I missed that issue... sorry about that. I now see that you define the metric similarly to ST-P3.

However, upon digging into the code of UniAD they are not conforming to the definition from SP-T3, which they have acknowledged here.

planning_results_computed = results["planning_results_computed"]
planning_tab = PrettyTable()
planning_tab.field_names = [
    "metrics",
    "0.5s",
    "1.0s",
    "1.5s",
    "2.0s",
    "2.5s",
    "3.0s",
]
for key in planning_results_computed.keys():
    value = planning_results_computed[key]
    row_value = []
    row_value.append(key)
    for i in range(len(value)):
        row_value.append("%.4f" % float(value[i]))
    planning_tab.add_row(row_value)

Here, planning_results_computed is the results from a single PlanningMetric.compute() (with n_future=6), meaning that they are computing the L2 distance as the pointwise norm rather than the mean of the norms up to that timestep.

Because of this, the comparison between your method and UniAD is misleading (as VAD's numbers use the more lenient metric definition while UniAD numbers are presented in the same table but using a different metric definition).

It would decrease the confusion if you would add their performance when using your (and ST-P3's original) metric.

Here are their displacement values when using your (and ST-P3) metric definition:

Method L2 (m) 1s L2 (m) 2s L2 (m) 3s
ST-P3 1.33 2.11 2.90
UniAD (their metric) 0.48 0.96 1.65
UniAD (your metric) 0.42 0.64 0.91
VAD-Tiny 0.46 0.76 1.12
VAD-Base 0.41 0.70 1.05

I will post these results on their GitHub as well in case they want to update their numbers (or show them in conjunction)

FYI, to comply with your metric definition we simply changed the code above to

for i in range(len(value)):
    row_value.append("%.4f" % float(value[:i+1].mean()))

PS. Please let us know if you think we've missed something and wrongly computed UniADs performance with your metric.