wenjiajia123's repositories
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
cpl
CPL: Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning
DCSR
[ICCV 2021 (Oral Presentation)] Dual-Camera Super-Resolution with Aligned Attention Modules (RefSR)
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
moment_detr
[NeurIPS 2021] Moment-DETR code and QVHighlights dataset
QD-DETR
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)
SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
UniVTG
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
UnLoc
UnLoc: A Unified Framework for Video Localization Tasks
UVCOM
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
WSAG
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
YouwikiHow
YouwikiHow dataset for weakly-supervised article grounding