The cmrf package provides a python implementation of our winning solution [1] for the MSR-Bing Image Retrieval Challenge in conjunction with ACM Multimedia 2015.
- Six individual methods (i.e. Image2text, Text2image, PSI, DeViSE, ConSE and Parzen window),
- Learning optimized weights for relevance fusion,
- Cross-platform support (linux, mac, windows).
- Download dataset without image visual feature.
- Download image visual feature [ required (5.0GB) | optional (7.9GB) ].
- Add simpleknn to
PYTHONPATH
. - Change
ROOT_PATH
in basic/common.py to local folder where dataset are stored in.
In order to generate cross-media relevance, image and query have to be represented in a common space as they are of two distinct modalities. In our package, we implement six individual methods for cross-media relevance computation and a late-fusion method for cross-media relevance fusion.
#####Individual training methods
- PSI: utilize stochastic gradient descent with mini-batches to minimize the margin ranking loss of PSI model.
- DeViSE: utilize stochastic gradient descent with mini-batches to minimize the margin ranking loss of DeViSE model.
- Other methods have no training process.
#####Individual test methods
- Image2text: project image into Bag-of-Words space.
- Text2image: project query into visual feature space.
- PSI: project image and query of Bag-of-Words into a learned latent space.
- DeViSE: project image and query of word2vec feature into a learned latent space.
- ConSE: project image and query into a learned word2vec space.
- Parzen window: an extreme case of text2image.
#####Relevance fusion
- Weight optimization: employ Coordinate Ascent to learn optimized weights
- Relevance fusion: fuse relevance from different methods with optimized weights.
Please run doit_all.sh to see if everything is in place.
If it runs successfully, the cross-media relevance of all the query-imge pairs will be written in result/final.result.txt
folder, and other intermediate results will also appear in result
folder.
- If you have not installed the Theano, you could run doit_4.sh (only Image2text, Text2image, ConSE and Parzen window)
- As a show case, we only run 20 queries. If you want to run all the 1000 queries from Dev set, please rename 'qid.text.all.txt' in
/rootpath/msr2013dev/Annotations/
to 'qid.text.txt'. It will take a while. - If you would like to use your own dataset, we recommand you to organize dataset in a fixed structure like our data, which can minimize your coding effort.
- The package does not include any visual feature extractors. Features of data need to be pre-computed, and converted to required binary format using txt2bin.py.
[1] Jianfeng Dong, Xirong Li, Shuai Liao, Jieping Xu, Duanqing Xu, Xiaoyong Du. Image Retrieval by Cross-Media Relevance Fusion. ACM Multimedia 2015 (Multimedia Grand Challenge Session)