problem with spconv
muzi2045 opened this issue · comments
Hi, I'm using pytorch1.4 + CUDA10.0 + spconv1.1 when training on the Nuscenes dataset.
But there exists some BUG in sparse conv layer when using spconv lib
have you occurred this problem with spconv 1.0? ( it looks like the error will happen sometimes)
And hope for a solution for using the newest dependent lib.
refer this : traveller59/spconv#74
Hi, I have read about this problem but I have never encountered it (my spconv fork is v1.0, based on commit 7342772). Have you noticed any speed gain with spconv v1.1? I have read the cuhash implementation in v1.1 results in much higher memory consumption. I will test this to make sure because I would prefer to use v1.1 as long as it has no memory problems.
If you want to use v1.1, maybe you can try the fix table_size_ = unsigned(floor(max_table_entries * space_usage) + 1)
mentioned in the thread you linked?
Yes, I'm testing the fix code, Hope for no crash in training....
I am testing spconv1.0 in the fork version https://github.com/jhultman/spconv
and the newest spconv https://github.com/traveller59/spconv when I am training on the same Nuscenes dataset.
Here exist some interesting things:
when using spconv1.1, the unknown BUG in hash_table.cpp will leading to crash sometimes,the fix in hash_table.cpp can't help, But the train batch_size=2 is OK.
when switch to the spconv1.0, the batch_size=2 will leading to CUDA out of memory error.
set the batch_size=1, the GPU memory consume on single 2080TI -->
My Testing environment:
Pytorch1.4
spconv1.0
CUDA10.0
python 3.6.8
Ubuntu 16.04
@jhultman