Typos in paper?

Question

Typos in paper?

chrisoffner opened this issue 4 months ago · comments

In section 3.1. under Discussion it says

Using a generic architecture allows to leverage strong pretraining technique, ultimately surpassing what existing task-specific architectures can achieve.

Should this be "techniques" or "a strong pretraining technique"?

In section 3.3. under Recovering intrinsics the paper states

hence only the focal $f_1^∗$ remains to be estimated.

Should this should say "focal length $f_1^*$"?

Moreover, equation (1) states

$$X^{n, m} = P_m P_n^{-1} h (X^n)$$ with $P_m, P_n \in \mathbb{R}^{3 \times 4}$ the world-to-camera poses for images $n$ and $m$ ...

Maybe this is me just nitpicking, but for the matrix inverse $P_n^{-1}$ to exist, $P_n$ would need to be square.

Am I correct in assuming that $P_n^{-1}$ is the top left $3 \times 4$ submatrix of the inverse of the $4 \times 4$ matrix that stacks $P_n$ on top of the row vector $[0, 0, 0, 1]$?

Vincent Leroy · Answer 1 · Wed Apr 10 2024 22:41:00 GMT+0800 (China Standard Time)

Thanks for picking up the typos!
Regarding the last point, yes the world2cam poses are usually 3x4 matrices, and you can convert them to homogeneous before inversion. You could also manually invert the rotation and translation parts like:
$$( P = [R | t] ) \rightarrow ( P^{-1} = [R^T | -R^T t] )$$

This practice seems standard enough to keep the text as is.