CBIR using VGG-RMAC feature

1. Results

1.1 image retrieval and object localization for No. 1 - 5 query (top_k = 3)

The following table has 4 rows, 5 columns, each column is for one query. In each column, the first one is query image, followed with 3 retrieved ones. You can click images to see full size image in result directory.

For more results (top_k=10), Please go to Demo.ipynb.

Q1	Q2	Q3	Q4	Q5

1.2 quantitative results on validation data

TBA.

2. How to use

Step-by-step tutorial: Demo.ipynb

Dataset(google drive): pg_data | supplementary

If you want to further develop based on this repository, you may refer to section 4 (not finished yet): design, implementation, features and discussion.

3. Methodology

Methodology introduction: [github] | [my blog] (recomended, for better view of mathematical notations and flowchart)

4. Design, implementation, discussion

4.1 design

SearchEngine

build()
- get db_feature_mat computed or loaded to memory.
retrieve_img()
- image-level content retrieval.
retrieve_object()
- object-level content retrieval.similar to image-level retrieval, but will preprocess image by masking * image with object bounding boxes and locate objects on top ranked images (if needed).
index_new_img()[pending]
- index new images on the fly

FeatureExtractor

compute_im_feature()
- compute image-level feature embedding
compute_top_matches()
- compute top ranked image to retreieve
compute_bb_mat()
- locate objects in retrieved image
_map_im_path_to_cache_path()
- map image path string to corresponding cached feature path string
get_im_feature_by_img_path()
- read cached image feature if found, otherwise compute and then cache it
get_db_feature_matrix()
- read database feature matrix if found, otherwise compute and ten cache it

Currently, the implementation is slightly different from the design:

Function names
BOWFeatureExtractor is not implemented yet, coming soon. SIFT-BOW-CIBR

4.2 implementation

When doing object localization, we apply level supression, since I observed that otherwise the localization will prefer larger window. see line xx in xx.
The RMAC implementaion is minimal and more efficient than existing implementations. Moreover, I providde other option on pooling method ('RAAC') and regional aggregation (average, not tested).
Experiments show that Level 1 pooling is important for retrieval. If initial scale is 2, the performance get worse (see dev notebook)

4.3 features

support index new image on the fly
support automatic caching
easy to extend framework

5. TODO

add table of contents in README.md
add dynamical indexing function
unify SIFT-BOW/SIFT-TFIDF to the this framework.
query expansion and reranking

About

Content based image retrieval, instance search (examplar object detection) using CNN, especially VGG-RMAC feature. Implemented with pytorch.

MIT License

Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%