cmrf

The cmrf package provides a python implementation of our winning solution [1] for the MSR-Bing Image Retrieval Challenge in conjunction with ACM Multimedia 2015.

Six individual methods (i.e. Image2text, Text2image, PSI, DeViSE, ConSE and Parzen window),
Learning optimized weights for relevance fusion,
Cross-platform support (linux, mac, windows).

Dependency

Download dataset without image visual feature.
Download image visual feature [ required (5.0GB) | optional (7.9GB) ].
Add simpleknn to PYTHONPATH.
Change ROOT_PATH in basic/common.py to local folder where dataset are stored in.

Description

In order to generate cross-media relevance, image and query have to be represented in a common space as they are of two distinct modalities. In our package, we implement six individual methods for cross-media relevance computation and a late-fusion method for cross-media relevance fusion.

#####Individual training methods

PSI: utilize stochastic gradient descent with mini-batches to minimize the margin ranking loss of PSI model.
DeViSE: utilize stochastic gradient descent with mini-batches to minimize the margin ranking loss of DeViSE model.
Other methods have no training process.

#####Individual test methods

Image2text: project image into Bag-of-Words space.
Text2image: project query into visual feature space.
PSI: project image and query of Bag-of-Words into a learned latent space.
DeViSE: project image and query of word2vec feature into a learned latent space.
ConSE: project image and query into a learned word2vec space.
Parzen window: an extreme case of text2image.

#####Relevance fusion

Weight optimization: employ Coordinate Ascent to learn optimized weights
Relevance fusion: fuse relevance from different methods with optimized weights.

Get Started

Please run doit_all.sh to see if everything is in place. If it runs successfully, the cross-media relevance of all the query-imge pairs will be written in result/final.result.txt folder, and other intermediate results will also appear in result folder.

Note

If you have not installed the Theano, you could run doit_4.sh (only Image2text, Text2image, ConSE and Parzen window)
As a show case, we only run 20 queries. If you want to run all the 1000 queries from Dev set, please rename 'qid.text.all.txt' in /rootpath/msr2013dev/Annotations/ to 'qid.text.txt'. It will take a while.
If you would like to use your own dataset, we recommand you to organize dataset in a fixed structure like our data, which can minimize your coding effort.
The package does not include any visual feature extractors. Features of data need to be pre-computed, and converted to required binary format using txt2bin.py.

Reference

[1] Jianfeng Dong, Xirong Li, Shuai Liao, Jieping Xu, Duanqing Xu, Xiaoyong Du. Image Retrieval by Cross-Media Relevance Fusion. ACM Multimedia 2015 (Multimedia Grand Challenge Session)

li-xirong / cmrf