Beast code in Giters

lntzm's starred repositories

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

11993 271 109

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonMIT3558 100 160

Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Language:PythonApache-2.02868 30 111

Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Language:Jupyter NotebookMIT1771 21 62

VideoX

VideoX: a collection of video cross-modal models

Language:PythonNOASSERTION968 21 112

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Language:PythonApache-2.0756 9 82

PLLaVA

Official repository for the paper PLLaVA

Language:Python568 13 74

MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Language:PythonBSD-3-Clause499 10 75

SAM-6D

[CVPR2024] Code for "SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation".

Language:Python329 22 71

UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Language:PythonMIT315 6 46

GroundingGPT

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Language:PythonApache-2.0286 14 10

DEADiff

[CVPR 2024] Official implementation of "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations"

Language:PythonApache-2.0214 11 17

VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonNOASSERTION208 2 35

QD-DETR

Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 Paper)

Language:PythonNOASSERTION196 4 44

Awesome_Long_Form_Video_Understanding

Awesome papers & datasets specifically focused on long-term videos.

168 90

TriDet

[CVPR2023] Code for the paper, TriDet: Temporal Action Detection with Relative Boundary Modeling

Language:PythonMIT161 3 37

HBI

[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Language:PythonApache-2.0103 4 7

BACL

Balanced Classification: A Unified Framework for Long-Tailed Object Detection (TMM 2023)

Language:PythonApache-2.095 4 8

ProST

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral

Language:PythonApache-2.089 3 7

HiREST

Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)

Language:PythonMIT89 5 12

MomentDiff

MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023

Language:PythonNOASSERTION73 4 9

VideoTree

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Language:PythonMIT70 2 4

ViGA

"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.

Language:PythonMIT64 2 5

DHVT

This is an official implementation of our NeurIPS 2022 paper "Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets".

Language:PythonApache-2.051 3 2