hustvl / VAD

[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving

Home Page:https://arxiv.org/abs/2303.12077

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't reproduce results with pretrained weights

SimonDoll opened this issue · comments

Hi,
thanks a lot for your great work and efforts to release code and pretrained models!

I followed your instructions for environment setup and used your data .pkl and pretrained weights.
Using the provided inference and visualization commands does not reproduce your results for me.
Sadly, all detection and segmentation metrics are close to zero:

mAP: 0.0000
mATE: 1.0000
mASE: 1.0000
mAOE: 1.0000
mAVE: 1.0000
mAAE: 1.0000
NDS: 0.0000
Eval time: 20.6s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.000   1.000   1.000   1.000   1.000   1.000
truck   0.000   1.000   1.000   1.000   1.000   1.000
bus     0.000   1.000   1.000   1.000   1.000   1.000
trailer 0.000   1.000   1.000   1.000   1.000   1.000
construction_vehicle    0.000   1.000   1.000   1.000   1.000   1.000
pedestrian      0.000   1.000   1.000   1.000   1.000   1.000
motorcycle      0.000   1.000   1.000   1.000   1.000   1.000
bicycle 0.000   1.000   1.000   1.000   1.000   1.000
traffic_cone    0.000   1.000   1.000   nan     nan     nan
barrier 0.000   1.000   1.000   1.000   nan     nan
-*-*-*-*-*-*-*-*-*-*threshhold:1.5-*-*-*-*-*-*-*-*-*-*
cls:divider done in 10.671474s!!
cls:ped_crossing done in 1.378579s!! 
cls:boundary done in 11.997211s!!

+--------------+-------+--------+--------+-------+
| class        | gts   | dets   | recall | ap    |
+--------------+-------+--------+--------+-------+
| divider      | 27332 | 88237  | 0.085  | 0.003 |
| ped_crossing | 6406  | 24793  | 0.049  | 0.001 |
| boundary     | 21050 | 187920 | 0.114  | 0.002 |
+--------------+-------+--------+--------+-------+
| mAP          |       |        |        | 0.002 |
+--------------+-------+--------+--------+-------+
divider: 0.001115116495687592
ped_crossing: 0.00023997755279954922 
boundary: 0.0007757550420137704
map: 0.0007102830301669705

Also the resulting visualizations look distorted (similar to #8 :
image

I used the VAD_base_stage_2.py config with the provided pretrained weights.

  • Can you help me figuring out what I configured wrong? Do the pretrained weights match the given config and data preparation?
  • Can you provide the result metrics of stage-1 and stage-2 training as well

Thank you so much in advance for your help!

I might have found a solution:
image
The img_norm_cfg was changed in 156a744 as discussed in #9. It seems as if this config does not fit the pretrained weights. Changing the corresponding line to:

img_norm_cfg = dict(
   mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)

seems to fix the issue (see visualization results above) but I need to continue the investigation.

Hi,
When I executed the visulation script using the checkpoint provided by the author, an error occured,
Python command: python tools/analysis_tools/visualization.py --result-path test/VAD_base_stage_2/Sat_Sep__9_02_04_54_2023/pts_bbox/results_nusc.pkl --save-path vis_results
Error:
Exception has occurred: ValueError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)
File "/workspace/projects/VAD/projects/mmdet3d_plugin/core/bbox/structures/nuscenes_box.py", line 268, in render_fut_trajs_grad_color
fut_coord = np.concatenate((self.center[np.newaxis, :2], fut_coord), axis=0)
File "/workspace/projects/VAD/tools/analysis_tools/visualization.py", line 567, in visualize_sample
box.render_fut_trajs_grad_color(axes, linewidth=1, mode_idx=mode_idx, fut_ts=6, cmap='autumn')
File "/workspace/projects/VAD/tools/analysis_tools/visualization.py", line 315, in lidiar_render
visualize_sample(nusc, sample_token, gt_annotations, pred_annotations,
File "/workspace/projects/VAD/tools/analysis_tools/visualization.py", line 719, in render_sample_data
lidiar_render(sample_toekn, pred_data, out_path=out_path,
File "/workspace/projects/VAD/tools/analysis_tools/visualization.py", line 747, in
render_sample_data(sample_token_list[id],
File "/opt/conda/envs/vad/lib/python3.8/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/vad/lib/python3.8/runpy.py", line 192, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)
Have you ever met this problem? If so could you give me some suggestions to solve this problem? Thank you very much!

I might have found a solution: image The img_norm_cfg was changed in 156a744 as discussed in #9. It seems as if this config does not fit the pretrained weights. Changing the corresponding line to:

img_norm_cfg = dict(
   mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)

seems to fix the issue (see visualization results above) but I need to continue the investigation.

Yes, we have updated the ‘img_norm_cfg’ based on the suggestion from this issue, which is different when we train the model. This is the reason for the problem you encountered here. In the latest commit, we have included an explanation of this problem to help people reproduce results correctly. Thank you for your valuable suggestion.