lntzm's starred repositories
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
GroundingGPT
[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model
Awesome_Long_Form_Video_Understanding
Awesome papers & datasets specifically focused on long-term videos.
MomentDiff
MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023
Language-Enhanced-CLIP-For-Multi-label-Image-Recognition
3rd Place, Visual Prompt Tuning Challenge @ CVPR 2023 HIT Workshop (2023)