multimodal-learning video-grounding video-retrieval

Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval

PyTorch Implementation of paper:

Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval.

Minjoon Jung, Seongho Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang

Getting started

This project is implemented based on Pytorch with Anaconda.

Get the repo

git clone https://github.com/minjoong507/MPGN.git

Prepare feature files

Download the original feature files. Also, please check here to generate pseudo supervision.

Environment

Our environments:

Python 3.9
PyTorch 1.13.1
CUDA 12.0

You can also run our code under the same environments in TVR.

Training the model

We give the code for training the Cross-modal Moment Localization (XML).

If you want to train the model with the pseudo queries, you can use --training_w_pseudo_supervision and also use --training_strategy to decide the type of pseudo queries (visual, textual, aug). aug refers to using both type of the pseudo queries.

bash baselines/crossmodal_moment_localization/scripts/train.sh \
tvr video_sub resnet_i3d \
--exp_id test_run \
--training_w_pseudo_supervision \
-- training_strategy aug

Citations

If our project is useful to your research, please consider citing our papers:

@inproceedings{jung2022modal,
  title={Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval},
  author={Minjoon Jung and Seongho Choi and Joochan Kim and Jin-Hwa Kim and Byoung-Tak Zhang},
  booktitle={EMNLP},
  year={2022}
}

Acknowledgement

Our project follows the codes in TVR. We thank the authors for sharing their great work.

About

[EMNLP 2022] Pytorch code for "Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval"

https://arxiv.org/abs/2210.12617

multimodal-learning video-grounding video-retrieval

MIT License

Languages

Language:Python 94.2%Language:Shell 5.8%