About Inference Time of Large Image with Dozens Persons

Question

About Inference Time of Large Image with Dozens Persons

hnuzhy opened this issue 5 years ago · comments

Thank you for your reproducing of PersonLab. I used it to train model on COCO train2017 and got a not bad intermediate result. However, when I use the model to detect a 1K(1920x1080) image with about 40 persons, the Inference Time is very long(about 3s do all detections, and 13s do group matching) even with GPUs.
I found it mainly caused by the stage of Group Joints, especially the function compute_heatmaps() with inputs kp_maps, short_offsets which is the refinement of Keypoints Heatmaps. Theoretically, Inference Time of bottom-up method will not grow linearly when persons increase. Is there a problem with the implementation of the function compute_heatmaps()?

Jacob Rich · Answer 1 · Fri Sep 27 2019 02:51:18 GMT+0800 (China Standard Time)

Hi @hnuzhy . Thanks so much for your interest in KerasPersonLab. I'm really glad you're getting decent results with this. I would also appreciate if you would let me know if you ever achieve results comparable to the paper with this code.

I haven't had a chance to work on this in quite a while, but post-processing time was indeed an issue. In theory, bottom-up methods don't take longer with more instances for the actual network forward-pass, but typically the post-processing required for the network output will be dependent on the number of instances present. In this case, we'd expect it to be the skeleton grouping that is more consuming as the instance count increases, and the heatmap refinement should - in theory - only be affected by the image size.

If I had more time to work on this repo (pull requests are also welcome), the post-processing speed definitely could be improved. I was focusing more on the model inference and start-up time, which I was making improvements to in the updated_model_def branch, but did not manage to finish up.

Huayi Zhou · Answer 2 · Fri Sep 27 2019 11:15:07 GMT+0800 (China Standard Time)

@jricheimer Your reply is so detailed and well-judged. Limited by training resources and time, I now have only an intermediate model (about 30 epochs), and it is uncertain whether the model can achieve the effect in the paper. Nevertheless, I will share the model once its training is completed. In addition, I only care about pose estiomation part, so I delete the instance segmentation part of the original code. According to the paper, these two parts are not strongly related. I hope the final model effect will not be affected.
Actually, I'm trying to improve the post-processing of your KerasPersonLab with the model's outputs these days. Typically, I intend to rewrite compute_heatmaps(), get_keypoints() and group_skeletons() and move them to TensorFlow to run, hoping to speed up post-peocessing. These processes really can't be integrated into DNN model. But now there are still some problems, it is not clear whether it will succeed or not.
PersonLab is a SOTA method in ECCV2018. And in recent years, there have been many fast and good algorithms of the same kind. If the attempt fails, I may give up and turn to them. Anyway, thank you for your work and answers.

Ahmed Aly · Answer 3 · Thu Apr 02 2020 13:31:17 GMT+0800 (China Standard Time)

What other SOTA pose estimation methods are you referring to?

For me, the code is very very slow to start up and do the pre-processing steps.

Ahmed Aly · Answer 4 · Fri May 08 2020 11:17:04 GMT+0800 (China Standard Time)

@hnuzhy you can turn off the refinement part since it doesn't affect accuracy that much and is really slow.