Two-stage training evaluation

Question

Two-stage training evaluation

alfredgu001324 opened this issue a year ago · comments

Hi, I have trained two models (stage 1 and stage 2) according to your instruction. However, when I run this command
CUDA_VISIBLE_DEVICES=0 python tools/test.py /home/guxunjia/Desktop/VAD/projects/configs/VAD/VAD_base_stage_2.py /home/guxunjia/Desktop/VAD/work_dirs/VAD_base_stage_2/epoch_12.pth --launcher none --eval bbox --tmpdir tmp

The following error occur:

projects.mmdet3d_plugin WARNING!!!!, Only can be used for obtain inference speed!!!! load checkpoint from local path: /home/guxunjia/Desktop/VAD/work_dirs/VAD_base_stage_2/epoch_12.pth [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 5.0 task/s, elapsed: 16s, ETA: 0sTraceback (most recent call last): File "tools/test.py", line 294, in <module> main() File "tools/test.py", line 274, in main print(dataset.evaluate(outputs['bbox_results'], **eval_kwargs)) File "/home/guxunjia/Desktop/VAD/projects/mmdet3d_plugin/datasets/nuscenes_vad_dataset.py", line 1786, in evaluate result_dict['ADE_'+cls] = all_metric_dict['ADE_'+cls] / all_metric_dict['cnt_ade_'+cls] ZeroDivisionError: float division by zero

I am using the mini dataset for training and eval. I checked the 'all_metric_dict' and it shows the following:

(Pdb) all_metric_dict {'gt_car': 701.0, 'gt_pedestrian': 659.0, 'cnt_ade_car': 0.0, 'cnt_ade_pedestrian': 2.0, 'cnt_fde_car': 0.0, 'cnt_fde_pedestrian': 0.0, 'hit_car': 0.0, 'hit_pedestrian': 0.0, 'fp_car': 46.0, 'fp_pedestrian': 0.0, 'ADE_car': 0.0, 'ADE_pedestrian': tensor(2.8956), 'FDE_car': 0.0, 'FDE_pedestrian': 0.0, 'MR_car': 0.0, 'MR_pedestrian': 0.0}

I am wondering if this is normal since I am using the mini dataset for training (just trying out), and it will affect the performance of the model and leads to these zeros values. When I am using your checkpoint model, everything is fine. I am wondering what is the correct file/procedure for evaluating the two-stage model?

Thank you so much!

alfredgu001324 · Answer 1 · Thu Aug 31 2023 22:41:29 GMT+0800 (China Standard Time)

I also tried training end-to-end on the mini dataset, the problem still occurs. So I am guessing that the small dataset is not enough to train at least some minimal results for exploring?

Bo Jiang · Answer 2 · Thu Sep 07 2023 10:59:24 GMT+0800 (China Standard Time)

Yes, the mini dataset of nuScenes is very small, and I think the original epoch configs are insufficient for convergence. If you want to try on the mini dataset, I think at least a much larger epoch is required.