Pointcept / Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training of OA-CNN with custom data fails on flat surfaces

meyerjo opened this issue · comments

Hi, first of all thanks for the nice repository.

I am trying to train various models on custom data. However, with OA-CNN I encounter the problem below.

Traceback (most recent call last):
  File "/workspace/Pointcept/tests/test_models.py", line 216, in test_oacnn
    self._model_dict_for_val_loader(self.model_definition_dict["oacnn"])
  File "/workspace/Pointcept/tests/test_models.py", line 209, in _model_dict_for_val_loader
    output_dict = model(input_dict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/default.py", line 20, in forward
    seg_logits = self.backbone(input_dict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/oacnns/oacnns_v1m1_base.py", line 324, in forward
    x = self.enc[i](x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/oacnns/oacnns_v1m1_base.py", line 158, in forward
    x = self.down(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/modules.py", line 138, in forward
    input = module(input)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 755, in forward
    return self._conv_forward(self.training,
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 408, in _conv_forward
    raise e
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 385, in _conv_forward
    res = ops.get_indice_pairs_implicit_gemm(
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/ops.py", line 460, in get_indice_pairs_implicit_gemm
    raise ValueError(
ValueError: your out spatial shape [12, 12, 0] reach zero!!! input shape: [25, 25, 1]

I can influence the number of input files where this error occurs by the grid size of the sampling I choose. I.e. with a grid-size of 0.1 it fails on 17/174 files; with a grid size of 0.01 it fails on only 4/174 files.

I took a look at the grid_coord and noticed that all data on which it fails have a range (max grid_coord - min grid_coord) in the z-component of less than 15. Equaling 15 and above it just works fine. Looking at the failure cases also showed that all failures correspond to flat surfaces on e.g. streets. See example below (blue background comes from visualizing with CloudCompare)
image

Do you have any workaround so that OA-CNN can be trained and (especially) evaluated on all data?

Hi, this is caused by spconv. You can refer to a discussion here #236.

Thanks! I changed the torch.add(..., 1) to torch.add(..., 96) as in the other issue and it worked. Wouldn't it be better to introduce a parameter for the configuration for that?

I think the parameter can be fixed as 96. I will modify the model code later.