Training of OA-CNN with custom data fails on flat surfaces

Question

Training of OA-CNN with custom data fails on flat surfaces

meyerjo opened this issue 3 months ago · comments

Hi, first of all thanks for the nice repository.

I am trying to train various models on custom data. However, with OA-CNN I encounter the problem below.

Traceback (most recent call last):
  File "/workspace/Pointcept/tests/test_models.py", line 216, in test_oacnn
    self._model_dict_for_val_loader(self.model_definition_dict["oacnn"])
  File "/workspace/Pointcept/tests/test_models.py", line 209, in _model_dict_for_val_loader
    output_dict = model(input_dict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/default.py", line 20, in forward
    seg_logits = self.backbone(input_dict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/oacnns/oacnns_v1m1_base.py", line 324, in forward
    x = self.enc[i](x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/oacnns/oacnns_v1m1_base.py", line 158, in forward
    x = self.down(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/modules.py", line 138, in forward
    input = module(input)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 755, in forward
    return self._conv_forward(self.training,
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 408, in _conv_forward
    raise e
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 385, in _conv_forward
    res = ops.get_indice_pairs_implicit_gemm(
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/ops.py", line 460, in get_indice_pairs_implicit_gemm
    raise ValueError(
ValueError: your out spatial shape [12, 12, 0] reach zero!!! input shape: [25, 25, 1]

I can influence the number of input files where this error occurs by the grid size of the sampling I choose. I.e. with a grid-size of 0.1 it fails on 17/174 files; with a grid size of 0.01 it fails on only 4/174 files.

I took a look at the grid_coord and noticed that all data on which it fails have a range (max grid_coord - min grid_coord) in the z-component of less than 15. Equaling 15 and above it just works fine. Looking at the failure cases also showed that all failures correspond to flat surfaces on e.g. streets. See example below (blue background comes from visualizing with CloudCompare)

Do you have any workaround so that OA-CNN can be trained and (especially) evaluated on all data?

Xiaoyang Wu · Answer 1 · Thu May 16 2024 15:28:24 GMT+0800 (China Standard Time)

Hi, this is caused by spconv. You can refer to a discussion here #236.

Johannes Meyer · Answer 2 · Fri May 17 2024 05:24:12 GMT+0800 (China Standard Time)

Thanks! I changed the torch.add(..., 1) to torch.add(..., 96) as in the other issue and it worked. Wouldn't it be better to introduce a parameter for the configuration for that?

Xiaoyang Wu · Answer 3 · Fri May 17 2024 12:00:26 GMT+0800 (China Standard Time)

I think the parameter can be fixed as 96. I will modify the model code later.