wenjie710 / PivotNet

Source code of PivotNet (ICCV2023, PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Very slow training speed

gaosanyuan opened this issue · comments

I found that the training speed is very slow when pivot releated logic is used.
I guess the main reason is dynamic programming logic in match cost and map loss, which are run on cpu.
I want to know if they can be implemented on GPU.
Thanks

  1. Although pivot dynamic matching can be seamlessly implemented on a GPU, the minimal matrix multiplication involved suggests that migrating this part to a GPU may not yield significant time savings.
  2. To improve training time efficiency, we can adapt the current implementation by either increasing the batch size or decreasing the input image size.
  3. Another potential area for improvement lies in the assignment process. Currently, we perform instance-level assignment followed by point-level assignment. However, given that point-level assignment is inherent in instance-level assignment, obtaining the point-level assignment results concurrently with the instance-level assignment could enhance time efficiency.
  1. Although pivot dynamic matching can be seamlessly implemented on a GPU, the minimal matrix multiplication involved suggests that migrating this part to a GPU may not yield significant time savings.
  2. To improve training time efficiency, we can adapt the current implementation by either increasing the batch size or decreasing the input image size.
  3. Another potential area for improvement lies in the assignment process. Currently, we perform instance-level assignment followed by point-level assignment. However, given that point-level assignment is inherent in instance-level assignment, obtaining the point-level assignment results concurrently with the instance-level assignment could enhance time efficiency.

Thanks @wenjie710

But

  1. When you try to train the model using hybrid matching strategy and shifting the polygon many times, you can find that the pivot dynamic matching is really a bottleneck
  2. I think we should always change only one group of variables when comparing two experiments
  3. When computing the match cost, the time complexity is O(m x n), where m is the number of queries and n is number of gts multiplied by number of shifts. And when computing loss, the the time complexity is only O(number of gt). We can see that,
    #gts << #queries * #shifts * #gts. So, although point-level assignment can used for instance-levle assignment, it may be not necessary.

I think I may have misunderstood you before, so I reopened this issue.

  1. "I think we should always change only one group of variables when comparing two experiments" Could you clarify which two experiments you are referring to? Are they detailed in the paper or the accompanying code?

  2. Could you elaborate on the concept of "shifting polygon many times"? Are you referring to the approach employed by MapTR, or is it a different concept?

  3. The complexity of sequence matching cost is O(NT), where N is the max number of points in an instance and T is the max length of ground truth sequences, which is independent of the number of GT and DT instances Code. Therefore, obtaining the point-level assignment results concurrently with the instance-level assignment can enhance time efficiency.

  4. I think implementing the matching part in C++/ CUDA extension would make it faster if necessary.

Please let me know if you have any further concerns.

I think I may have misunderstood you before, so I reopened this issue.

  1. "I think we should always change only one group of variables when comparing two experiments" Could you clarify which two experiments you are referring to? Are they detailed in the paper or the accompanying code?
  2. Could you elaborate on the concept of "shifting polygon many times"? Are you referring to the approach employed by MapTR, or is it a different concept?
  3. The complexity of sequence matching cost is O(NT), where N is the max number of points in an instance and T is the max length of ground truth sequences, which is independent of the number of GT and DT instances Code. Therefore, obtaining the point-level assignment results concurrently with the instance-level assignment can enhance time efficiency.
  4. I think implementing the matching part in C++/ CUDA extension would make it faster if necessary.

Please let me know if you have any further concerns.

Thanks for your reply.
I believe that there are many methods to impove the training speed.
But the training speed will be a problem when many stategies are used together, such as "shifting polygon many times" (mentined in the MapTR), using hybrid matching etc. Now, the pivot matching process will be very slow in CPU (base on my experiment) if the number of queries are large.
So I think it is necessary to have a cuda version of pivot dynamic matching including computing the matching score and matching indexes.
Thanks.