How to find the scale necessary to compute the pose accuracy ?

Question

How to find the scale necessary to compute the pose accuracy ?

abenbihi opened this issue 5 years ago · comments

Hello Paul-Edouard,

Thank you for releasing the code of your paper.

I am currently trying to compute the camera pose accuracy and face the scale problem you mention in section 5.1, paragraph Datasets:
"A metric scale cannot be recovered with SfM reconstruction but is important to compute localization metrics. We therefore manually label each SfM model using metric distances measured in Google Maps."

Could you explain to me how to recover the scale in more details, please? Or point a resource that explains the steps in detail?

Thank you

Paul-Edouard Sarlin · Answer 1 · Sun Oct 20 2019 03:06:40 GMT+0800 (China Standard Time)

Hi Assia,

Thank you for your interest in our work. We manually labeled some keypoints in a couple of images and measuring the real distances in GMaps. It is not so accurate (+/- 10cm I think) but at least makes absolute pose errors more meaningful. We did that for most evaluation sequences (10 of them, excluding the Lincoln statue). Please find below the scales, which can be loaded by our evaluation script here. This script used to obtain them is here.

Disclaimer: I would instead recommend following the approach used for the CVPR 2019 Image Matching Workshop, which uses essential matrix estimation to obtain the pose (up to a scale), and evaluates the angles between the rotation matrices and translation vectors. This evaluation process is more fair w.r.t. the detectors, since it does not use the depth maps, which are noisy and have missing values at the edges of the buildings.

# Multiply the translation vectors, 3D point coordinates, and depth maps by the given scaling factor to obtain a metric reconstruction.
reichstag 15.564790777735576
british_museum 2.516522885798819
florence_cathedral_side 6.810832638368868
london_bridge 28.55251300930921
milan_cathedral 12.427682838857516
mount_rushmore 10.42442665534655
piazza_san_marco 8.569324936325351
sagrada_familia 4.141963811200239
st_pauls_cathedral 6.9462380123536045
united_states_capitol 20.897868321133668

Assia · Answer 2 · Sun Oct 20 2019 05:24:13 GMT+0800 (China Standard Time)

Thank you, the notebook is very useful.

The alternative metric based only on the angles is indeed a good alternative that would eliminate the problem.
But some benchmarks seem to require camera pose with scale (e.g. CMU-Seasons at https://www.visuallocalization.net/benchmark/) with the accuracy based both on the translation magnitude and the rotation one.

So I think I will compute both metrics and close the issue.