This is an implementation of SIFT (Scale Invariant Feature Transform) from the paper Distinctive Image Features from Scale-Invariant Keypoints (Lowe, 2004), using C++. OpenCV is used for its image container as well as some basic image processing subroutines.
Currently supports matching between 2 images.
- To create Makefile, run
cmake .
- To compile, run
make
- To match, run
./sift <image1> <image2>
This implementation stays mostly true to that of Lowe's. The following is a summary of the procedure, for details please refer to the paper. (see references)
-
Scale space: Difference of Gaussian of input image under different
$\sigma$ , called "scale", is used to aproximate the scale-normalized Laplacian of Gaussian.$\sigma=1.6$ is set according to paper. -
Scale space extrema: Local maxima and minima in the scale space are selected as candidates for features.
-
Low contrast rejection
$\hat x=-\frac{\partial^2D}{\partial x^2}^{-1}\frac{\partial D}{\partial x}$ $D(\hat x)=D+0.5\frac{\partial D}{\partial x}^T \hat x$ Accurate keypoint localization is calculated by fitting a 3D quadratic function to the local sample points. Then the DoG response at the new extrema is checked for value. Points with$|D(\hat x)|<0.03 \times 255$ are discarded. -
Edge response elimination Peaks in difference of Gaussian will have large principle curvatures across the edge but a small one in the perpendicular direction. The ratio of principle curvatures
$r$ can be calculated from 2^nd^ order derivatives, which is precomputed for the image.$\frac{(D_{xx}+D_{yy})^2}{D_{xx}D_{yy}-(D_{xy})^2}=\frac{(r+1)^2}{r}<\frac{(r_0+1)^2}{r_0}$ According to the paper,$r_0=10$ is used. -
Orientation assignment Gradient orientation of Gaussian image around a keypoint is used to assign orientation to the keypoint. Gradient magnitude is used for weighting.
-
Creating descriptors
- Gradient orientation is used again for creation of keypoint descriptor.
- However, the orientations are shifted relative to the orientation of the keypoint.
- The orientations are divided into 8 bins and added up across a 4x4 sub-region.
- According to the paper, 16 of such descriptor are calculated, ceating a 8x16=128 dimension vector for each of the keypoints found.
- The vectors are normalized to unit length. (image taken from the paper)
- Descriptor matching
- For now, an exhaustive is used to match the features.
- The paper uses euclidean distance between vectors to determine a match. Only the matches in which the ratio of closest distance to second closest one is lesser than 0.8 are viewed as correct match.
In other words,
$\frac{v - v_1}{v - v_2} < 0.8, \forall v_2$ - To simplify the calculation, we use inner product instead. More specifically,
$\frac{1-v^Tv_1}{1-v^Tv_2}< 0.64$ - To derive the above result, we simply square the original equation, coupled with the fact that all vectors are unit-lengthed:
$(\frac{v - v_1}{v - v_2})^2 < 0.8^2\to \frac{1-v^Tv_1}{1-v^Tv_2}< 0.64$
- Distinctive Image Features from Scale-Invariant Keypoints. Lowe, 2004.
- OpenCV 4