rayguan97 / M3DETR

Code base for M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No key word batch_cls_preds

andy690 opened this issue · comments

When training the M3DETR, there is an error which is : No key word batch_cls_preds happened in VoxelSetAbstractionTransFusionv5.

The specific code is as follows:

def reduce_points(self, batch_dict):

    batch_indices = batch_dict['points'][:, 0].long()

    masks = []
    for bs_idx, roi in enumerate(batch_dict['batch_cls_preds']):
        bs_mask = (batch_indices == bs_idx)
        pts = batch_dict['points'][bs_mask].unsqueeze(dim=1)[:, :, 1: 4]  # (N, 1, 3)
        s, _ = torch.max(batch_dict['batch_cls_preds'][bs_idx], dim=1)
        top, idx = torch.topk(s, self.topks)
        c = batch_dict['batch_box_preds'][bs_idx][idx][:, :3].unsqueeze(dim=0)
        dist = (pts - c)**2       

        dist, _ = dist.sum(dim=-1).min(dim=1)
        mask = (dist <= self.reduce_radius)
        masks.extend(mask)

    batch_dict['points'] = batch_dict['points'][masks]
    return batch_dict

It seems to me that the training phase should not be present: "batch_cls_preds," which should only be present in the inference phase. I would like to ask your opinion.

Hi,

We never run into this issue before. Could you provide a little more details? For example, does this happen in the middle of training, or in the beginning of the training? Do you have problem only during training, not inference?

We'll also look into it.

I just ran the cmd "bash ./scripts/dist_train.sh 2 --cfg_file ./cfgs/m3detr_models/m3detr_kitti.yaml --workers 4" without any issue.

Could you make sure that you install the code correctly? Maybe there are conflicting packages? Have you run "python setup.py develop" after cd into the repo?

Thanks for your prompt reply, I have found the problem. Since I integrated the code into my openpcdet and did not modify detector_template.py, this is where the problem occurred.
self.module_topology = [
'vfe', 'backbone_3d', 'map_to_bev_module',
'backbone_2d', 'dense_head', 'pfe', 'point_head', 'roi_head'
]
# self.module_topology = [
# 'vfe', 'backbone_3d', 'map_to_bev_module', 'pfe',
# 'backbone_2d', 'dense_head', 'point_head', 'roi_head'
# ]

Its slight difference leads to a different order in which the network structure is built. The problem happens in the beginning of the training,