3dpose / 3D-Multi-Person-Pose

Hi,
Kudos for the great work and for releasing the code.

You mention that you do not use camera parameters, however, in my understanding, if you can compute a ratio between the predicted root and the GT root, then you predict the absolute root right? Or assume at least some fixed focal length? I could not find an exact value of focal length in the code. Is this right?

Thanks!

In this work, we do not mention that camera parameters (focal length f) are not needed. In particular, following the AAAI'21 paper, the normalized root length Z/f is estimated instead of Z to avoid the influence of the camera intrinsic parameters as stated in the Supp (TCN structure subsection, page 12).

In order to recover the actual Z value, the focal length needs to be set. In the code, to perform the evaluation, the actual Z value is obtained by Procrustes alignment, which fits the focal length with respect to ground truth (GT):

3D-Multi-Person-Pose/lib/posematcher.py

Line 89 in 10cc7d7

depths = procrustes(depths[None, ...], depth_gt[None, ...])[0]

In the situation that there is no GT, for example in-the-wild videos, using a default focal length value is one option. To accurately recover the actual depth Z, the focal length is needed.

Focal lenght