Zero recall value while evaluating on LMO dataset
supriya-gdptl opened this issue · comments
Hello @wangg12
I tried to evaluate the GDR-Net model on LMO dataset using the pretrained models you shared on OneDrive.
I used following command to run the valuation:
python core/gdrn_modeling/main_gdrn.py --config-file configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py \
--num-gpus 1 \
--eval-only \
--opts MODEL.WEIGHTS=output/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e/gdrn_lmo_real_pbr.pth
However, it is showing zero recall values. Please see the screenshot below.
Could you please help?
Maybe you should check your full running log to see where the problem is.
The features from the backbone is a tensor of zeros. (On line 121 in GDRN.py). Because of this, all further steps output zero tensor.
features =
tensor([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]]], device='cuda:0')
The log says all weights (backbone, pnp_net and rot_head) from the checkpoint are loaded correctly. Still the output of backbone is zero tensor.
Did you encounter such error before? Do you know what might be causing it?
See below the log
20221204_124725|fvcore.common.checkpoint@152: [Checkpointer] Loading from D:/research/data/gdrnet_data/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_blender_160e/model_final_wo_optim.pth ...
20221204_124728|d2.checkpoint.c2_model_loading@324: Following weights matched with model:
| Names in Model | Names in Checkpoint | Shapes |
|:--------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------|
| backbone.bn1.* | backbone.bn1.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,) |
| backbone.conv1.weight | backbone.conv1.weight | (64, 3, 7, 7) |
| backbone.layer1.0.bn1.* | backbone.layer1.0.bn1.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,) |
| backbone.layer1.0.bn2.* | backbone.layer1.0.bn2.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,) |
| backbone.layer1.0.conv1.weight | backbone.layer1.0.conv1.weight | (64, 64, 3, 3) |
| backbone.layer1.0.conv2.weight | backbone.layer1.0.conv2.weight | (64, 64, 3, 3) |
| backbone.layer1.1.bn1.* | backbone.layer1.1.bn1.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,) |
| backbone.layer1.1.bn2.* | backbone.layer1.1.bn2.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,) |
| backbone.layer1.1.conv1.weight | backbone.layer1.1.conv1.weight | (64, 64, 3, 3) |
....
| pnp_net.fc1.* | pnp_net.fc1.{bias,weight} | (1024,) (1024,8192) |
| pnp_net.fc2.* | pnp_net.fc2.{bias,weight} | (256,) (256,1024) |
| pnp_net.fc_r.* | pnp_net.fc_r.{bias,weight} | (6,) (6,256) |
| pnp_net.fc_t.* | pnp_net.fc_t.{bias,weight} | (3,) (3,256) |
.....
| rot_head_net.features.0.weight | rot_head_net.features.0.weight | (512, 256, 3, 3) |
| rot_head_net.features.1.* | rot_head_net.features.1.{bias,num_batches_tracked,running_mean,running_var,weight} | (256,) () (256,) (256,) (256,) |
| rot_head_net.features.10.weight | rot_head_net.features.10.weight | (256, 256, 3, 3) |
Thank you,
Supriya
Yes. But I installed from source. It seems you were running on windows, could you run the code on Ubuntu?
Thank you for the suggestion @wangg12.
I figured out the issue.
The features from backbone were zero because the weights of backbone were zero.
The checkpoint was getting loaded correctly but for some unknown reason, Line 550 in gdrn_evaluator.py was resetting the weights to zero.
I resolved this issue by loading the checkpoint again after line 550.
I got the following result.
Could you please tell me what does each metric in the first column stand for, i.e. what does ad_2, rete_2, re_2, te_2, proj_2, re, te
stand for?
Thank you,
Supriya
Here https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/main/core/gdrn_modeling/gdrn_custom_evaluator.py#L772 you can find what those metrics mean.