rayleizhu / SIFT-BOW-CBIR

content based image retrieval using SIFT-BOW feature.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SIFT-BOW-CBIR

Reference

Dev Notes

  1. How to parallelize SIFT feature extraction in python? The easiest way is using joblib.Parallel, however, if using defeault setting, the following error may occur:
TypeError: can't pickle cv2.xfeatures2d_SIFT objects

I solve this problem by simply setting backend argument to 'threading':

cluster_des_list = Parallel(n_jobs=self.n_jobs, backend='threading')(delayed(self._aux_func_pt)(im_name)
                                                        for im_name in tqdm(cluster_im_names)
                                                       )

Here are also some other solutions I didn't try:

  1. Will joblib.Parallel keep the order of data as if the code is executed searilly?

Since I'm matching image name by the row index of feature matrix, the order of image name list and feature matrix rows should keep the same. Fortunately, I found that, the oreder is kept well when I use joblib.Parallel. I used the following snippet to confirm this fact:

from joblib import Parallel, delayed
import numpy as np
NUM = range(1000)
EXPECTED = [np.sqrt(x) for x in NUM]
for it in range(100):
    rnum = Parallel(n_jobs=-1, backend='threading')(delayed(np.sqrt)(x) for x in NUM)
    if not (rnum == EXPECTED):
        zped = zip(rnum, EXPECTED)
        print('Discrepancy in iteration %d' % (it))
        print([(x, ex) for (x, ex) in zped if x != ex])
        break
    else:
        print('Order kept.')

Note that, I'm using joblib 0.14.1. There are someone reported that, the result can be disordered, see the github issue here.

  1. How to indicate the progress of feature extraction, especially in the multiprocessing block? tqdm is superisingly powerful, and even works for multiprocessing scenario. The usage is also easy:
cluster_des_list = Parallel(n_jobs=self.n_jobs, backend='threading')(delayed(self._aux_func_pt)(im_name)
                                                        for im_name in tqdm(cluster_im_names)
                                                       )

Simply replacing cluster_im_names with tqdm(cluster_im_names) gives you a nice progressbar.

  1. How to draw a bounding box of object detection on retrieved image?
    In short, this is achived by homography wrapping. Just wrap the source bounding box to retrieved image. You may refer to How to draw bounding box on best matches?. An official documents can be found here Feature Matching + Homography to find Objects

  2. Which norm should be used for local descriptor matching?

it depends not on the images, but on the descriptors you use.

for binary descriptors, like ORB,BRIEF,BRISK you must use the HAMMING norm, the descriptors are bitstrings, not numbers

for float descriptors, like SURF or SIFT, use any of L1, L2, L2sqr (L2 probably works best)

the Feature2D class also has a defaultNorm() member, which you can query.

According to Which norm is the best to match descriptors?. To match SIFT decriptor, if using HAMMING norm, there will be an error says:

error: OpenCV(3.4.2) /tmp/build/80754af9/opencv-suite_1535558553474/work/modules/core/src/batch_distance.cpp:245: error: (-215:Assertion failed) (type == 0 && dtype == 4) || dtype == 5 in function 'batchDistance'

TODO

  1. switch to multiple image bf matcher. see example here BFMatcher match in OpenCV throwing error
  2. other way to compute similarity (especially asymmetric ones, e.g. KL-divergence)
  3. vocabulary tree
  4. use cv2 FLANN to replace sklearn KNN. See an example here
  5. inverted index
  6. support dynamically indexing new images
  7. memory-thrift solution?
  8. other feature support? (e.g. surf-bow, CNN feature)

About

content based image retrieval using SIFT-BOW feature.


Languages

Language:Jupyter Notebook 99.9%Language:Python 0.1%