[BUG] <problems encountered when reproducing artifact evaluation>
hua0x522 opened this issue · comments
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
I have encountered problems when I reproduce AE of TorchSparse++. I downloaded the code from https://zenodo.org/records/8311889 and used the datasets provided of authors, which have been preprocessed.
(spconv) wxz@gpu4:~/torchsparse/torchsparse-artifact-micro-main/artifact-p1/evaluation$ CUDA_LAUNCH_BLOCKING=1 python evaluate.py
[Warning] The current device does not support fp16. Set precision to fp32
Traceback (most recent call last):
File "evaluate.py", line 301, in <module>
main()
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "evaluate.py", line 220, in main
_ = model(inputs)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wxz/torchsparse/torchsparse-artifact-micro-main/artifact-p1/evaluation/core/models/segmentation_models/minkunet.py", line 104, in forward
x3 = self.stage3(x2)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wxz/torchsparse/torchsparse-artifact-micro-main/artifact-p1/evaluation/core/models/modules/layers_3d.py", line 125, in forward
x = self.relu(self.net(x) + self.downsample(x))
File "/home/wxz/torchsparse/torchsparse/tensor.py", line 109, in __add__
feats=self.feats + other.feats,
RuntimeError: CUDA error: invalid configuration argument
My GPU is GPU 0: Tesla V100-PCIE-32GB (UUID: GPU-b57016fe-8dca-4290-b860-a09e19c8fb30)
Before encountering this problem, I got this one firstly:
(spconv) wxz@gpu4:~/torchsparse/torchsparse-artifact-micro-main/artifact-p1/evaluation$ python evaluate.py
[Warning] The current device does not support fp16. Set precision to fp32
Traceback (most recent call last):
File "evaluate.py", line 301, in <module>
main()
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "evaluate.py", line 220, in main
_ = model(inputs)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wxz/torchsparse/torchsparse-artifact-micro-main/artifact-p1/evaluation/core/models/segmentation_models/minkunet.py", line 101, in forward
x0 = self.stem(x)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/wxz/miniconda3/envs/spconv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wxz/torchsparse/torchsparse/nn/modules/conv.py", line 98, in forward
return F.conv3d(
File "/home/wxz/torchsparse/torchsparse/nn/functional/conv/conv.py", line 47, in conv3d
dataflow = config.dataflow
AttributeError: 'dict' object has no attribute 'dataflow'
I tried to fix it by just ignore the config passed in torchsparse/nn/functional/conv/conv.py
:
# torchsparse/nn/functional/conv/conv.py: line 37
config = None
if config is None:
config = F.conv_config.get_global_conv_config()
if config is None:
config = F.conv_config.get_default_conv_config(
conv_mode=conv_mode, training=training
)
# TODO: Deal with kernel volume > 32. (Split mask or unsort)
dataflow = config.dataflow
kmap_mode = config.kmap_mode
Expected Behavior
No response
Environment
- GCC:9.3.0
- NVCC:11.3
- PyTorch:1.10.0+cu113
- PyTorch CUDA:11.3
- TorchSparse:2.1.0
Anything else?
No response
Hi @hua0x522 , Than you for your interest in TorchSparse! Did you build the docker container for the artifact evaluation? It looks like you are running it in your local environment. The problem is that you have installed TorchSparse v2.1.0, while you are running the benchmark code for TorchSparse v2.0. ( In the folder of artifact-p1
.)
To run the benchmark code for v2.1.0, you should switch to the folder of artifact-p2
, and remove your change torchsparse/nn/functional/conv/conv.py
. Additionally, I strongly recommend you follow the README.md in artifact-p2
and build the docker container for benchmark evaluation.
Finally, the GPU you are using might be a bit too old (does not support fp16 arithmetics), which means that you may not be able to reproduce the figures in our paper with this GPU.
Thank you.
Thank you for your patient explanation. Now I can correctly execute the AE code in artifact-p2.
Hello author, I found that in the evaluation, the Minkunet model output of SPCONV and Torchsparse ++ is different,(artifact-p2 evaluate.py, model output cosine similarity is approximately 0.81 ). I make sure each backend using same input point clouds. And, the cosine similarity between ME and Torchsparse++ output is approximately 0.99.I am not very familiar with this field and may have made some naive mistakes. Looking forward to your reply.