-
How to efficiently find k-nearest neighbours in high-dimensional data? - StackOverflow: kdtree is not efficient for search k-nearest neighborhood when the dimension of data is very high. But some approximate alternatives (which don't search strict k-nearest neighbors, but approximate ones) can be applied. Also, LSH is mentioned.
-
Is kdtree used for speeding k-means clustering or not? - StackOverflow
-
How to access object attribute given string corresponding to name of that attribute - StackOverflow
-
Is sift algorithm invariant in color? - StackOverflow: SIFT only processs grayscale image, if the input image is not, it will first convert input to grayscale image internally.
-
CBIR : Vocabulary Tree在10K图片的实验 - 知乎: A brief introduction and experiment on Vocabulary Tree. But while it's fast, I think vocabulary tree is still an approximate algorithm.
-
How to speed up KMeans from sklean - StackOverflow: MinibatchKMeans is an option.
-
hcmarchezi/vocabulary_tree - github: python vocabulary_tree from scratch.
-
snavely/VocabTree2 - github: C++ vocabulary tree.
-
Python Bag of Words clustering - StackOverflow: related to cv2.BOWKMeansTrainer, I don't know if this one is better compared to sklearns KMeans implementation.
-
Precomputed matrix for fitting with scikit neighbors/radius classification: Tells why sklearn's NearestNeighbor optionally accepts precomputed distance matrix.
-
willard-yuan/cnn-cbir-benchmark - github: CBIR benchmark including SIFT retrieval and some improvements.
- How to parallelize SIFT feature extraction in python? The easiest way is using joblib.Parallel, however, if using defeault setting, the following error may occur:
TypeError: can't pickle cv2.xfeatures2d_SIFT objects
I solve this problem by simply setting backend argument to 'threading':
cluster_des_list = Parallel(n_jobs=self.n_jobs, backend='threading')(delayed(self._aux_func_pt)(im_name)
for im_name in tqdm(cluster_im_names)
)
Here are also some other solutions I didn't try:
- Will joblib.Parallel keep the order of data as if the code is executed searilly?
Since I'm matching image name by the row index of feature matrix, the order of image name list and feature matrix rows should keep the same. Fortunately, I found that, the oreder is kept well when I use joblib.Parallel. I used the following snippet to confirm this fact:
from joblib import Parallel, delayed
import numpy as np
NUM = range(1000)
EXPECTED = [np.sqrt(x) for x in NUM]
for it in range(100):
rnum = Parallel(n_jobs=-1, backend='threading')(delayed(np.sqrt)(x) for x in NUM)
if not (rnum == EXPECTED):
zped = zip(rnum, EXPECTED)
print('Discrepancy in iteration %d' % (it))
print([(x, ex) for (x, ex) in zped if x != ex])
break
else:
print('Order kept.')
Note that, I'm using joblib 0.14.1. There are someone reported that, the result can be disordered, see the github issue here.
- How to indicate the progress of feature extraction, especially in the multiprocessing block? tqdm is superisingly powerful, and even works for multiprocessing scenario. The usage is also easy:
cluster_des_list = Parallel(n_jobs=self.n_jobs, backend='threading')(delayed(self._aux_func_pt)(im_name)
for im_name in tqdm(cluster_im_names)
)
Simply replacing cluster_im_names
with tqdm(cluster_im_names)
gives you a nice progressbar.
-
How to draw a bounding box of object detection on retrieved image?
In short, this is achived by homography wrapping. Just wrap the source bounding box to retrieved image. You may refer to How to draw bounding box on best matches?. An official documents can be found here Feature Matching + Homography to find Objects -
Which norm should be used for local descriptor matching?
it depends not on the images, but on the descriptors you use.
for binary descriptors, like ORB,BRIEF,BRISK you must use the HAMMING norm, the descriptors are bitstrings, not numbers
for float descriptors, like SURF or SIFT, use any of L1, L2, L2sqr (L2 probably works best)
the Feature2D class also has a defaultNorm() member, which you can query.
According to Which norm is the best to match descriptors?. To match SIFT decriptor, if using HAMMING norm, there will be an error says:
error: OpenCV(3.4.2) /tmp/build/80754af9/opencv-suite_1535558553474/work/modules/core/src/batch_distance.cpp:245: error: (-215:Assertion failed) (type == 0 && dtype == 4) || dtype == 5 in function 'batchDistance'
- switch to multiple image bf matcher. see example here BFMatcher match in OpenCV throwing error
- other way to compute similarity (especially asymmetric ones, e.g. KL-divergence)
- vocabulary tree
- use cv2 FLANN to replace sklearn KNN. See an example here
- inverted index
- support dynamically indexing new images
- memory-thrift solution?
- other feature support? (e.g. surf-bow, CNN feature)