jhultman / vision3d

Research platform for 3D object detection in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

problem with spconv

muzi2045 opened this issue · comments

Hi, I'm using pytorch1.4 + CUDA10.0 + spconv1.1 when training on the Nuscenes dataset.
But there exists some BUG in sparse conv layer when using spconv lib
image
have you occurred this problem with spconv 1.0? ( it looks like the error will happen sometimes)
And hope for a solution for using the newest dependent lib.

refer this : traveller59/spconv#74

Hi, I have read about this problem but I have never encountered it (my spconv fork is v1.0, based on commit 7342772). Have you noticed any speed gain with spconv v1.1? I have read the cuhash implementation in v1.1 results in much higher memory consumption. I will test this to make sure because I would prefer to use v1.1 as long as it has no memory problems.

If you want to use v1.1, maybe you can try the fix table_size_ = unsigned(floor(max_table_entries * space_usage) + 1) mentioned in the thread you linked?

Yes, I'm testing the fix code, Hope for no crash in training....

I am testing spconv1.0 in the fork version https://github.com/jhultman/spconv
and the newest spconv https://github.com/traveller59/spconv when I am training on the same Nuscenes dataset.

Here exist some interesting things:
when using spconv1.1, the unknown BUG in hash_table.cpp will leading to crash sometimes,the fix in hash_table.cpp can't help, But the train batch_size=2 is OK.
when switch to the spconv1.0, the batch_size=2 will leading to CUDA out of memory error.
set the batch_size=1, the GPU memory consume on single 2080TI -->
image

My Testing environment:
Pytorch1.4
spconv1.0
CUDA10.0
python 3.6.8
Ubuntu 16.04
@jhultman