QinLab-WFU / DSPH

Source code for TCSVT paper “Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal Retrieval”

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep Semantic-aware Proxy Hashing for Multi-label Cross-modal Retrieval Paper

This paper is accepted for IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). If you have any questions please contact hyd199810@163.com.

Dependencies

We use python to build our code, you need to install those package to run

  • pytorch 1.12.1
  • sklearn
  • tqdm
  • pillow

Training

Processing dataset

Before training, you need to download the oringal data from coco(include 2017 train,val and annotations), nuswide(include all), mirflickr25k Baidu, 提取码:u9e1 or Google drive (include mirflickr25k and mirflickr25k_annotations_v080), then use the "data/make_XXX.py" to generate .mat file

After all mat file generated, the dir of dataset will like this:

dataset
├── base.py
├── __init__.py
├── dataloader.py
├── coco
│   ├── caption.mat 
│   ├── index.mat
│   └── label.mat 
├── flickr25k
│   ├── caption.mat
│   ├── index.mat
│   └── label.mat
└── nuswide
    ├── caption.txt  # Notice! It is a txt file!
    ├── index.mat 
    └── label.mat

Download CLIP pretrained model

Pretrained model will be found in the 30 lines of CLIP/clip/clip.py. This code is based on the "ViT-B/32".

You should copy ViT-B-32.pt to this dir.

Start

After the dataset has been prepared, we could run the follow command to train.

python main.py --is-train --dataset coco --caption-file caption.mat --index-file index.mat --label-file label.mat --lr 0.001 --output-dim 64 --save-dir ./result/coco/64 --clip-path ./ViT-B-32.pt --batch-size 128 --numclass 80

Citation

@ARTICLE{10149001,
author={Huo, Yadong and Qin, Qibing and Dai, Jiangyan and Wang, Lei and Zhang, Wenfeng and Huang, Lei and Wang, Chengduan},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal Retrieval},
year={2024},
volume={34},
number={1},
pages={576-589},
doi={10.1109/TCSVT.2023.3285266}}

Acknowledegements

DCHMT

About

Source code for TCSVT paper “Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal Retrieval”


Languages

Language:Python 100.0%