wangwen39 / CMHN

Cross-Modal Hashing for Efficiently Retrieving Moments in Videos

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CMHN

Cross-Modal Hashing for Efficiently Retrieving Moments in Videos

We propose an end-to-end Cross-Modal Hashing Network, dubbed CMHN, to efficiently retrieve target moments within the given video via various natural language queries.
Specifically, it first adopts a dual-path neural network to respectively learn the feature representations for video and query, and then it utilizes the cross-modal hashing strategy to guide the corresponding hash codes learning for them.
Put simply, our proposed model jointly considers the discriminative feature learning and effective cross-modal hashing.
Moreover, we conduct extensive experiments on two public datasets ActivityNet Captions and TACoS. The experimental results show that our proposed model is more effective, efficient and scalable than the state-of-the-art models.
The introduction of CMHN in details will be given in the form of an authorized patent and a published paper within half a year.
An illustration of the cross-modal moment retrieval and the framework of CMHN are shown in the following two figures.

Dateset

For convenience of training and testing, we packed the dataset, and we will upload it later.

How to run

Please place the data files to the appropriate path and set it in tacos.py and activitynet_captions.py.

python tacos.py

or

python activitynet_captions.py

About

Cross-Modal Hashing for Efficiently Retrieving Moments in Videos


Languages

Language:Python 100.0%