hova88 / PointPillars_MultiHead_40FPS

A REAL-TIME 3D detection network [Pointpillars] compiled by CUDA/TensorRT/C++.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong result after the first time

vietanhdev opened this issue · comments

Thank you for your great source code..
When I run the example more than one time, I received very random results. Can you explain the reason and give me some ideas to fix this bug?

Thank you for your great source code..
When I run the example more than one time, I received very random results. Can you explain the reason and give me some ideas to fix this bug?

The bug is the variables in PointPillars::DoInference are not cleared after use in multiple runs. Just put cudaMemset for this variables at the beginning of PointPillars::DoInference . Thank you for reminding me. I'll fix it later

Maybe we need to reallocate memory for some buffers? For example, when the second run has more points in the point cloud than the first run, we need more memory?

Maybe we need to reallocate memory for some buffers? For example, when the second run has more points in the point cloud than the first run, we need more memory?

no need , buffers memory space is designed according to the maximum number of points, such as 400000. I think the reason is that the variable , like dev_scattered_feature_ , pfe_buffers_ ..., in the previous frame was not cleared. use cudaMemset to clear these variables before the current frame using.

@hova88 @vietanhdev Have you solved the problem of random results? I also encountered this problem. Thank you.

Have you solved the pro

The above problems should have been solved by use cudaMemset to zero. I'm not sure if you encounter the same problems. Please explain your problems in detail.

We have tested the this repo code on demo point cloud dataset and my dataset(only one frame). When use demo data, detection results is alawys changing. In my dataset, box is also changing.

finished inference.
index : 3688, num_objects: 228
in_num_points: 267990
------------------------------------
Module        Time        
------------------------------------
Preprocess    0.431291 ms
Pfe           4.26658  ms
Scatter       0.204408 ms
Backbone      3.05398  ms
Postprocess   4.36596  ms
Summary       12.3263  ms
------------------------------------
finished inference.
index : 3689, num_objects: 230
in_num_points: 267990
------------------------------------
Module        Time        
------------------------------------
Preprocess    0.428828 ms
Pfe           4.27889  ms
Scatter       0.208322 ms
Backbone      3.0582   ms
Postprocess   4.33036  ms
Summary       12.3084  ms
------------------------------------
finished inference.
index : 3690, num_objects: 227
in_num_points: 267990

My dataset: two predictions from the same frame.
prediction1:
1

prediction2:
2

Orz....

This is definitely caused by my stupid post-processing process (postprocess.cu). And this code should be replaced by C, because the calculation of whole post-processing is quite light . There is no point by using cuda compute, i will fix this later.

Thanks. Two more points need to be pointed out:

  1. We observed cuda memory increased in loop inference, and we fixed it by cudafree, you can check it in pull requests.
  2. The out_labels is not class_idx,because there are two same variabies i in DoPostprocessCuda, this could be fixed too.

Thanks. Two more points need to be pointed out:

  1. We observed cuda memory increased in loop inference, and we fixed it by cudafree, you can check it in pull requests.
  2. The out_labels is not class_idx,because there are two same variabies i in DoPostprocessCuda, this could be fixed too.

ok , thanks for your commit.

@hova88 After testing with the latest code(commit id: b093730), the output results are still changing. Can you give me some debug advice? Thank you.

Is it possible that each pillar randomly samples a certain number of points in the algorithm? This may affect final output.

Is it possible that each pillar randomly samples a certain number of points in the algorithm? This may affect final output.

Maybe...this is just an example to prove that openpcdet can be easily translated into CUDA/C with TensorRT. I don't strictly detect the difference of each part between openpcdet and PointPillars_MultiHead_40FPS because the strategy of 10_sweeps cloud and {AB,C,DE,F..}-like head does not used in my practice. but, this is very important in your daily project and still requires a lot of patience to fix bug.

I got it. thank you