traveller59 / spconv

Spatial Sparse Convolution Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about performance

Jaywxy opened this issue · comments

Amazing! I have been using spconv1 before, but now I have switched to spconv2.1. It is amazing. It used to take 3 hours to train one epoch, but now it only takes 1.5 hours. And the GPU memory usage has been reduced by about 40%. But I still have some unclear questions. I wonder if you can help me answer them or give me some suggestions?
This is the data and data type passed into the model. How can I modify it to make the training more efficient?
image
image

Hi, can you share which model did you train and which profiling method did you use the to test the training time? Thanks

The model I use belongs to my senior brother, and I can’t share it with you yet. You can get the training time by checking the training process. Isn’t it possible to record the time of each epoch?

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

commented

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

94912c430371ba85d46dcacca7c0b0e
8dd5fbbd7ca095f04dc65d023272724

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d.

But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.
commented

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.
94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d.

But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!

Regarding the first point, have you tried doing this and does it work?

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.
94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d.
But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!

Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

commented

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.
94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d.
But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!
Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

In fact, I printed the running time of some modules during my model inference and found that they were not much more efficient than normal convolution. I still don't understand what the problem is.

commented

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.
94912c430371ba85d46dcacca7c0b0e 8dd5fbbd7ca095f04dc65d023272724

I didn't solve my issue, the model I tested is sparse convolution 2d.
But I have some recommendation for your code:

  1. First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
  2. Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!
Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

Hi, I used torch.cuda.Event to test the time and found no problem. Do you think this is the right thing to do? And did you do this before? Why is it not feasible to use the time library?

device='cuda:0'

x_d = torch.zeros((2, 4, 1024, 1024))
x_d[0,0,0:16,0:16] += 1.
x_d = x_d.to(device)
x = SparseConvTensor.from_dense(x_d.permute(0,2,3,1))

conv_sparse = spconv.SparseConv2d(4, 4, kernel_size=3,stride=2, padding=1,bias=False, dilation=1).to(device)
bn_sparse = nn.BatchNorm1d(4, momentum=0.1).to(device)
conv_bn_relu_sparse = spconv.SparseSequential(conv_sparse, bn_sparse, nn.ReLU(inplace=True)).to(device)

conv_norm = nn.Conv2d(4, 4, kernel_size=3,stride=2, padding=1,bias=False, dilation=1).to(device)
bn_norm = nn.BatchNorm2d(4, momentum=0.1).to(device)
conv_bn_relu_norm = nn.Sequential(conv_norm, bn_norm, nn.ReLU(inplace=True)).to(device)

for i in range(10):
     print("round:", i)
     start_event = torch.cuda.Event(enable_timing=True)
     end_event = torch.cuda.Event(enable_timing=True)
     start_event.record()
     encoder_output1 = conv_bn_relu_norm(x_d)
     end_event.record()
     end_event.synchronize()
     elapsed_time_ms = start_event.elapsed_time(end_event)
     print(f"conv_bn_relu_norm time: {elapsed_time_ms} milliseconds")

     start_event = torch.cuda.Event(enable_timing=True)
     end_event = torch.cuda.Event(enable_timing=True)
     start_event.record()
     encoder_output = conv_bn_relu_sparse(x)
     end_event.record()
     end_event.synchronize()
     elapsed_time_ms = start_event.elapsed_time(end_event)
     print(f"conv_bn_relu_sparse time: {elapsed_time_ms} milliseconds")

image