Questions about performance

Question

Questions about performance

Jaywxy opened this issue 2 months ago · comments

Amazing！ I have been using spconv1 before, but now I have switched to spconv2.1. It is amazing. It used to take 3 hours to train one epoch, but now it only takes 1.5 hours. And the GPU memory usage has been reduced by about 40%. But I still have some unclear questions. I wonder if you can help me answer them or give me some suggestions?
This is the data and data type passed into the model. How can I modify it to make the training more efficient?

Vanessa-F · Answer 1 · Wed May 29 2024 06:03:43 GMT+0800 (China Standard Time)

Hi, can you share which model did you train and which profiling method did you use the to test the training time? Thanks

Jaywxy · Answer 2 · Thu May 30 2024 16:15:41 GMT+0800 (China Standard Time)

The model I use belongs to my senior brother, and I can’t share it with you yet. You can get the training time by checking the training process. Isn’t it possible to record the time of each epoch?

Vanessa-F · Answer 3 · Fri May 31 2024 01:53:36 GMT+0800 (China Standard Time)

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

xzh_23 · Answer 4 · Sun Jun 02 2024 10:09:28 GMT+0800 (China Standard Time)

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

Vanessa-F · Answer 5 · Sun Jun 02 2024 12:38:47 GMT+0800 (China Standard Time)

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

I didn't solve my issue, the model I tested is sparse convolution 2d.

But I have some recommendation for your code:

First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.
Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

xzh_23 · Answer 6 · Sun Jun 02 2024 13:38:59 GMT+0800 (China Standard Time)

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

I didn't solve my issue, the model I tested is sparse convolution 2d.

But I have some recommendation for your code:

First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.

Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!

Regarding the first point, have you tried doing this and does it work?

Vanessa-F · Answer 7 · Sun Jun 02 2024 13:41:23 GMT+0800 (China Standard Time)

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

I didn't solve my issue, the model I tested is sparse convolution 2d.
But I have some recommendation for your code:

First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.

Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!

Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

xzh_23 · Answer 8 · Sun Jun 02 2024 13:44:03 GMT+0800 (China Standard Time)

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

I didn't solve my issue, the model I tested is sparse convolution 2d.
But I have some recommendation for your code:

First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.

Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!
Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

In fact, I printed the running time of some modules during my model inference and found that they were not much more efficient than normal convolution. I still don't understand what the problem is.

xzh_23 · Answer 9 · Mon Jun 03 2024 17:49:15 GMT+0800 (China Standard Time)

Actually I tried to measure the training and inference time on a single sparse convolution layer in many ways, like using time.time(), cuda.event.record, pytorch profiling tool, but didn't see any improvement of actual runtime. So can I ask you what type or the name of neural network are you using ? I don't need you to share a copy with me

Hi, have you solved it? I have the same problem.

I didn't solve my issue, the model I tested is sparse convolution 2d.
But I have some recommendation for your code:

First you should warm-up you GPU before measuring the time. For example, run 50 epoch on dense convolution net first, then run 100 epoch for both dense and sparse convolution, take the average training time for both results.

Try other time measurement method, for example, torch.cuda.event.record which you can search on Google or ask ChatGPT for other method.

Thanks for your quick reply!
Regarding the first point, have you tried doing this and does it work?

Doesn't work for me. I have tried all methods and techniques that I know for Spconv2d. But I didn't test 3d cases. If you have any progress, please share with me, thanks.

Hi, I used torch.cuda.Event to test the time and found no problem. Do you think this is the right thing to do? And did you do this before? Why is it not feasible to use the time library?

device='cuda:0'

x_d = torch.zeros((2, 4, 1024, 1024))
x_d[0,0,0:16,0:16] += 1.
x_d = x_d.to(device)
x = SparseConvTensor.from_dense(x_d.permute(0,2,3,1))

conv_sparse = spconv.SparseConv2d(4, 4, kernel_size=3,stride=2, padding=1,bias=False, dilation=1).to(device)
bn_sparse = nn.BatchNorm1d(4, momentum=0.1).to(device)
conv_bn_relu_sparse = spconv.SparseSequential(conv_sparse, bn_sparse, nn.ReLU(inplace=True)).to(device)

conv_norm = nn.Conv2d(4, 4, kernel_size=3,stride=2, padding=1,bias=False, dilation=1).to(device)
bn_norm = nn.BatchNorm2d(4, momentum=0.1).to(device)
conv_bn_relu_norm = nn.Sequential(conv_norm, bn_norm, nn.ReLU(inplace=True)).to(device)

for i in range(10):
     print("round:", i)
     start_event = torch.cuda.Event(enable_timing=True)
     end_event = torch.cuda.Event(enable_timing=True)
     start_event.record()
     encoder_output1 = conv_bn_relu_norm(x_d)
     end_event.record()
     end_event.synchronize()
     elapsed_time_ms = start_event.elapsed_time(end_event)
     print(f"conv_bn_relu_norm time: {elapsed_time_ms} milliseconds")

     start_event = torch.cuda.Event(enable_timing=True)
     end_event = torch.cuda.Event(enable_timing=True)
     start_event.record()
     encoder_output = conv_bn_relu_sparse(x)
     end_event.record()
     end_event.synchronize()
     elapsed_time_ms = start_event.elapsed_time(end_event)
     print(f"conv_bn_relu_sparse time: {elapsed_time_ms} milliseconds")