EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Z. Shou, Rama Chellappa, Pengchuan Zhang
ICCV, 2023
arxiv | project page

TL;DR: We introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones.

📢 News

[September, 2023] We release the EgoVLPv2 codebase, checkpoints and features.
[July, 2023] EgoVLPv2 is accepted in ICCV 2023.

📁 Repository Structure

The contents of this repository are structured as follows:

EgoVLPv2
    ├── EgoVLPv2
    │   ├── Pre-training on EgoClip version of Ego4D
    │   ├── Validation on EgoMCQ 
    │   ├── Zero-Shot and fine-tuning on EK-100 MIR
    │   ├── Zero-shot and fine-tuning on Charades-Ego
    │   └── Feature extraction on EgoMQ
    ├── EgoTaskQA
    │   └── Fine-tuning on EgoTaskQA direct and indirect splits
    ├── EgoNLQ
    │   └── Feature extraction and head-tuning on EgoNLQ 
    ├── QFVS
    │   └── Feature extraction and head-tuning on QFVS
    └── EgoMQ
        └── Head-tuning on EgoMQ

Each directory contains data settings, training/inference scripts, and checkpoints. Notably, we provided pre-extracted video and text features to power Ego4D NLQ & MQ challenges.

🛠️ Environment Preparation

conda create -n python=3.8.13 egovlpv2 pip
conda activate egovlpv2
pip install -r requirements.txt

✉️ Contact

This repository is created and maintained by Shraman. Questions and discussions are welcome via spraman3@jhu.edu. We are willing to merge results if EgoVLPv2 is transferred to other egocentric tasks or datasets.

🙏 Acknowledgements

The codebase for this work is built on the EgoVLP, LAVILA, FIBER, and VSLNet repository. We would like to thank the respective authors for their contribution, and the Meta AI team for discussions and feedback.

📄 License

EgoVLPv2 is licensed under a MIT License.

🎓 Citing EgoVLPv2

@article{pramanick2023egovlpv2,
  title={EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone},
  author={Pramanick, Shraman and Song, Yale and Nag, Sayan and Lin, Kevin Qinghong and Shah, Hardik and Shou, Mike Zheng and Chellappa, Rama and Zhang, Pengchuan},
  journal={arXiv preprint arXiv:2307.05463},
  year={2023}
}

facebookresearch / EgoVLPv2