This repository is the official implementation of Deep Metric Learning with Spherical Embedding on deep metric learning (DML) task.
π Training a vanilla triplet loss / semihard triplet loss / normalized N-pair loss (tuplet loss) / multi-similarity loss on CUB200-2011 / Cars196 / SOP / In-Shop datasets.
This repo was tested with Ubuntu 16.04.1 LTS, Python 3.6, PyTorch 1.1.0, and CUDA 10.1.
Requirements: torch==1.1.0, tensorboardX
-
Prepare datasets and pertained BN-Inception.
Download datasets: CUB200-2011, Cars196, SOP, In-Shop, unzip and organize them as follows.
ββββdatasets
ββββsplit_train_test.py
ββββCUB_200_2011
| ββββimages.txt
| ββββimages
| ββββ001.Black_footed_Albatross
| ββββ...
ββββCARS196
| ββββcars_annos.mat
| ββββcar_ims
| ββββ000001.jpg
| ββββ...
ββββSOP
| ββββStanford_Online_Products
| ββββEbay_train.txt
| ββββEbay_test.txt
| ββββbicycle_final
| ββββ...
ββββInshop
| ββββlist_eval_partition.txt
| ββββimg
| ββββMEN
| ββββWOMEN
| ββββ...
β Then run split_train_test.py
to generate training and testing lists.
β Download the imagenet pertained BN-Inception and put it into ./pretrained_models
.
-
To train the model(s) in the paper, run the following commands or use
sh mytrain.sh
.Train models with vanilla triplet loss.
CUDA_VISIBLE_DEVICES=0 python train.py --use_dataset CUB --instances 3 --lr 0.5e-5 --lr_p 0.25e-5 \
--lr_gamma 0.1 --use_loss triplet
β Train models with vanilla triplet loss + SEC.
CUDA_VISIBLE_DEVICES=0 python train.py --use_dataset CUB --instances 3 --lr 0.5e-5 --lr_p 0.25e-5 \
--lr_gamma 0.1 --use_loss triplet --sec_wei 1.0
β Train models with vanilla triplet loss + L2-reg.
CUDA_VISIBLE_DEVICES=0 python train.py --use_dataset CUB --instances 3 --lr 0.5e-5 --lr_p 0.25e-5 \
--lr_gamma 0.1 --use_loss triplet --l2reg_wei 1e-4
β Similarly, we set --use_loss
to semihtriplet
/n-npair
/ms
and --instances
to 3
/2
/5
, for training models with semihard triplet loss / normalized N-pair loss / multi-similarity loss. We set --use_dataset
to Cars
/SOP
/Inshop
, for training models on other datasets.
π The detailed settings of the above hyper-parameters is provided in Appendix B of our paper (with two exceptions to the lr settings listed below).
(a) multi-similarity loss without SEC/L2-reg on CUB: 1e-5/0.5e-5/0.1@3k, 6k
(b) multi-similarity loss without SEC/L2-reg on Cars: 2e-5/2e-5/0.1@2k
(We find that using a larger learning rate harms the original loss function.)
When training on a different dataset or with a different loss function, we only need to modify the hyper-parameters in above commands and the head settings (only when using multi-similarity loss without SEC/L2-reg, we need to set need_bn=False,
self.model = torch.nn.DataParallel(BNInception(need_bn=False)).cuda()
in line 24 of learner.py).
π Additionally, to use SEC with EMA method, we need to set
--norm_momentum <value>
, where norm_momentum denotes$\rho$ in Appendix D of our paper.
The test of NMI and F1 on SOP costs a lot of time, and we thus conduct it only after the training process (we only conduct test of R@K during training). In particular, run:
CUDA_VISIBLE_DEVICES=0 python test_sop.py --use_dataset SOP --test_sop_model SOP_xxxx_xxxx
or use sh test_sop.sh
for a complete test of NMI, F1, and R@K on SOP. Here SOP_xxxx_xxxx
is the model to be tested which could be found in ./work_space
.
For other three datasets, the test of NMI, F1, and R@K is conducted during the training process.
Our model achieves the following performance on CUB200-2011, Cars196, SOP, and In-Shop datasets:
If you find this repo useful for your research, please consider citing this paper
@article{zhang2020deep,
title={Deep Metric Learning with Spherical Embedding},
author={Zhang, Dingyi and Li, Yingming and Zhang, Zhongfei},
journal={arXiv preprint arXiv:2011.02785},
year={2020}
}