SKS & ACA Decomposition of 2D Homography
This repository is the official implementation of the paper:
Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations.
Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) are novel decomposition forms for 2D homography (projective transformation) matrices, which are superior to previous 4-point homography solvers (NDLT-SVD, HO-SVD, GPT-LU, RHO-GE) in terms of computational efficiency, geometrical meaning of parameters, and unified management for various planar configurations. The uploaded codes include the Matlab, C++ (with OpenCV or CUDA library) and Python (with PyTorch library) procedures used in CPU and GPU experiments.
[Project Page] [Paper] [Video] (Coming soon)
SKS Decomposition
SKS decomposes a 2D homography into three sub-transformations:
where
ACA Decomposition
ACA also decomposes a 2D homography into three sub-transformations:
where
Geometric Meanings
In SKS and ACA, each sub-transformation, and even each parameter of these transformations has geometric meaning. The whole decomposition process of SKS follows the philosophy of symmetry as shown below.
This figure introduces one kind of further decomposition of the kernel transformation,
The whole decomposition process of ACA is more concise as follows.
Algebraic Simplicity
SKS and ACA exhibit many unique properties in algebra, some of which are shown below.
No Need to Construct A Linear System of Equations
Previous 4-point homography methods (NDLT-SVD, HO-SVD, GPT-LU, RHO-GE) follow the same way to construct a square system of linear equations, followed by solving it through well-established matrix factorization methods, such as SVD and LU,
Such approachs are circuitous since the constructed coefficient matrix
Division-Free Solver
ACA is extremely concise in algebra and only requires 85 addtions, subtractions and multiplications of floating-point numbers to compute homographies up to a scale. Among four arithmetic operations, the most complicated division is avoided in ACA.
Floating-point Operations (FLOPs)
FLOPs of SKS and ACA for computing 4-point homographies up to a scale are 157 and 85 respectively. With the normalization based on the last element of homography (which costs 12 extra FLOPs), FLOPs of SKS and ACA are 169 and 97 respectively. Compared with commonly used robust 4-point homography solvers NDLT-SVD ($\ge$27K FLOPs) and GPT-LU (~1950 FLOPs), SKS and ACA represent {162x, 12x} and {282x, 20x}, respectively.
Polynomial Expression of Each Element of Homography
Owing to the extremely simple expression of each sub-transformation, we can represent each element of a homography by the input variables (
where
Homographies Mapping A Rectangle to A Quadrangle
All previous 4-point offsets based deep homography methods compute the homography mapping a square (UDHN_RAL18, DHDS_CVPR20, LocalTrans_ICCV21, DAMG_TCSVT22, IDHN_CVPR22) or rectangle (UDIS_TIP21,CAUDHN_ECCV20) in source image to a general quadrangle in target image. However, the previous methods treat the special rectanlge as a general quadrangle and no simplification is conducted. In SKS and ACA, homographies mapping a rectangle (or square) to a quadrangle are simplified straightforwardly. The complete steps of the tensorized ACA (TensorACA) for a rectangle are illustrated in the following Algorithm with only 15 vector operations (47 FLOPs). Consequently, FLOPs for a source square will be reduced to 14 vector operations (44 FLOPs).
A Unified Way to Decompose and Compute Affine Transformations
Affine transformations, as one kind of degenerate projective transformations, can also be managed by SKS and ACA in a unified way. Therefore, FLOPs of computing affine transformations with three points is significantly reduced, especially compared to the GPT-LU method (see OpenCV's function 'getAffineTransform').
Experiments
CPU Runtime
Compared to the three robust methods (NDLT-SVD, HO-SVD and GPT-LU), SKS(**) and ACA(**) represents {488x, 477x, 29x} and {731x, 713x, 43x} respectively under 'O2' compiler optimization. These numbers are bigger than FLOPs ratios as their implementations in OpenCV include more or less conditional branch judgments, data copy or exchange, OpenCV data structures, etc., which may severely influence the speed.
GPU Runtime
Runtime of multiple homographies computation in parallel on GPU is meaningful for both the feature-based RANSAC pipelines and the deep homography pipelines. Specifically, each 4-point homography is assigned to one thread of GPU for computation and the program statements will be sequentially executed by a GPU CUDA core. The total runtime of all algorithms for small numbers ($\leq$10K) of homographies in Table increases slightly with the increase of the numbers. This is because the parallel computation of small numbers of homographies don't trigger all 10496 CUDA cores of NVIDIA 3090 GPU.
From the tables, the fastest ACA algorithm can run about 70M and 4G times on the tested CPU and GPU respectively.