Extending Pose Estimation Model for 3D Objects: Customization and Challenges

Question

Extending Pose Estimation Model for 3D Objects: Customization and Challenges

manojs8473 opened this issue 9 months ago · comments

Hello!

First of all, thank you for delivering this incredible work! I've had fantastic results using this approach for human body pose estimation.

I'm interested in customizing the current model to estimate the 3D points of objects like a baseball bat or tennis racket in addition to the 3D points of the human body, which the model already does successfully. I have a few questions and doubts regarding this task:

Customizing Skeleton Hierarchy: Is it possible to customize the current skeleton hierarchy and add new bones or edges to represent the bat or racket? I assume this would be necessary to include these objects in the pose estimation.

Architectural Changes: What sort of changes will be required in the architecture of the model to accommodate the estimation of 3D points for objects? Are there any specific layers or components that need to be modified or added?

Training Data Volume: Could you provide insights into the volume of data that the model would require for training to achieve good accuracy in estimating the 3D points of both the human body and objects like baseball bats and tennis rackets?

Your comments and suggestions on how to approach this customization would be immensely appreciated. Thank you!

István Sárándi · Answer 1 · Thu Oct 12 2023 18:18:46 GMT+0800 (China Standard Time)

Human-object interaction reconstruction is an already existing direction of research and you might find some relevant approaches there.

But a simple thing could be to define a few object keypoints and train them together. Generally in human pose estimation, it's assumed that the keypoints are always present, which may not be the case with an object. For this the architecture wouldn't change much, only the last 1x1 conv layer would need to output more heatmaps.

It's hard to predict the necessary training data volume, but if the baseball/tennis training and test data is sufficiently similar, it's probably not too much, if it's finetuned from the backbone pretrained on the large human dataset combination.