No key word batch_cls_preds

Question

No key word batch_cls_preds

andy690 opened this issue 2 years ago · comments

When training the M3DETR, there is an error which is : No key word batch_cls_preds happened in VoxelSetAbstractionTransFusionv5.

The specific code is as follows:

def reduce_points(self, batch_dict):

    batch_indices = batch_dict['points'][:, 0].long()

    masks = []
    for bs_idx, roi in enumerate(batch_dict['batch_cls_preds']):
        bs_mask = (batch_indices == bs_idx)
        pts = batch_dict['points'][bs_mask].unsqueeze(dim=1)[:, :, 1: 4]  # (N, 1, 3)
        s, _ = torch.max(batch_dict['batch_cls_preds'][bs_idx], dim=1)
        top, idx = torch.topk(s, self.topks)
        c = batch_dict['batch_box_preds'][bs_idx][idx][:, :3].unsqueeze(dim=0)
        dist = (pts - c)**2       

        dist, _ = dist.sum(dim=-1).min(dim=1)
        mask = (dist <= self.reduce_radius)
        masks.extend(mask)

    batch_dict['points'] = batch_dict['points'][masks]
    return batch_dict

It seems to me that the training phase should not be present: "batch_cls_preds," which should only be present in the inference phase. I would like to ask your opinion.

Tianrui Guan · Answer 1 · Sat Aug 20 2022 02:53:30 GMT+0800 (China Standard Time)

Hi,

We never run into this issue before. Could you provide a little more details? For example, does this happen in the middle of training, or in the beginning of the training? Do you have problem only during training, not inference?

We'll also look into it.

Tianrui Guan · Answer 2 · Sat Aug 20 2022 04:57:36 GMT+0800 (China Standard Time)

I just ran the cmd "bash ./scripts/dist_train.sh 2 --cfg_file ./cfgs/m3detr_models/m3detr_kitti.yaml --workers 4" without any issue.

Could you make sure that you install the code correctly? Maybe there are conflicting packages? Have you run "python setup.py develop" after cd into the repo?

andy690 · Answer 3 · Sat Aug 20 2022 10:30:32 GMT+0800 (China Standard Time)

Thanks for your prompt reply, I have found the problem. Since I integrated the code into my openpcdet and did not modify detector_template.py, this is where the problem occurred.
self.module_topology = [
'vfe', 'backbone_3d', 'map_to_bev_module',
'backbone_2d', 'dense_head', 'pfe', 'point_head', 'roi_head'
]
# self.module_topology = [
# 'vfe', 'backbone_3d', 'map_to_bev_module', 'pfe',
# 'backbone_2d', 'dense_head', 'point_head', 'roi_head'
# ]

Its slight difference leads to a different order in which the network structure is built. The problem happens in the beginning of the training,