vamoko / LLMEPET

[MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Prior Knowledge Integration via LLM Encoding and Pseudo-Event Regulation for Video Moment Retrieval

PWC PWC PWC PWC

Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiao-Yong Wei, Chang Wen Chen, and Qing Li.

arXiv License

Official Pytorch Implementation of 'Prior Knowledge Integration via LLM Encoding and Pseudo-Event Regulation for Video Moment Retrieval'

Installation | Dataset | Training | Evaluation | Model Zoo

πŸ“’ News

[2024.7.21] Our paper has been accepted by ACM Multimedia 2024 (Oral).

[2024.7.10] The code and dataset of related tasks has been released.

[2024.5.10] The repository is public.

[2024.4.10] The repository is created.

βš™οΈ Installation

  1. Clone the repository from GitHub.
git clone https://github.com/fletcherjiang/LLMEPET.git
cd LLMEPET
  1. Create conda environment.
conda create -n LLMEPET python=3.8
conda activate LLMEPET
  1. Download the packages
pip install -r requirements.txt

πŸ—‚οΈ Dataset

For all datasets, we provide extracted features, download them and place them into features/

The prepared dataset should be in the following structure.

.
β”œβ”€β”€ LLMEPET
β”‚   β”œβ”€β”€ llm_epet
β”‚   └── data
β”‚   └── results
β”‚   └── run_on_video
β”‚   └── standalone_eval
β”‚   └── utils
β”œβ”€β”€ data
β”œβ”€β”€ features
β”‚   └── qvhighlight
β”‚   β””── charades
β”‚   └── tacos
β”‚   β””── tvsum
β”‚   └── youtube_uni
β”œβ”€β”€ llama
β”‚   β””── consolidated.00.pth
β”‚   β””── tokenizer.model
β”‚   β””── params.json
β”œβ”€β”€README.md
└── Β·Β·Β·

πŸͺ LLaMA Checkpoint

If you want to try LLaMA-2 or LLaMA-3, you could download the checkpoints from LLaMA-2 or LLaMA-3. You should edit the (llm_epet/llama.py) by yourself.

πŸš€ Training

QVHighlights Training

bash llm_epet/scripts/train.sh  

Charades-STA

bash llm_epet/scripts/charades_sta/train.sh

TACoS

bash llm_epet/scripts/tacos/train.sh  

TVSum

bash llm_epet/scripts/tvsum/train_tvsum.sh  

Youtube-hl

bash llm_epet/scripts/youtube_uni/train.sh  

⭐ QVHighlights Evaluation and Submission

bash llm_epet/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash llm_epet/scripts/inference.sh results/{direc}/model_best.ckpt 'test'

Pack the hl_{val,test}_submission.jsonl files and submit them to CodaLab.

πŸ“¦ Model Zoo

Dataset Model file
QVHighlights (Slowfast + CLIP) checkpoints
Charades (Slowfast + CLIP) checkpoints
TACoS checkpoints
TVSum checkpoints
Youtube-HL checkpoints

πŸ“– Citation

If you find the repository or the paper useful, please use the following entry for citation.

@inproceedings{
jiang2024prior,
title={Prior Knowledge Integration via {LLM} Encoding and Pseudo Event Regulation for Video Moment Retrieval},
author={Yiyang Jiang and Wengyu Zhang and Xulu Zhang and Xiaoyong Wei and Chang Wen Chen and Qing Li},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://arxiv.org/abs/2407.15051}
}

About

[MM'24 Oral] Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 96.3%Language:Shell 3.7%