Beast code in Giters

Shawn J.'s starred repositories

tarsier

Language:PythonApache-2.07000

LaCLIP

[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"

Language:PythonBSD-2-Clause24200

ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Language:Python117800

DCI

Densely Captioned Images (DCI) dataset repository.

Language:PythonNOASSERTION14800

videocon

Language:PythonMIT5000

HBI

[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Language:PythonApache-2.09900

UCoFiA

Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)

Language:PythonMIT4900

DreamLIP

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Language:PythonNOASSERTION6800

COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

MIT18000

evit

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Language:PythonApache-2.016200

ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Language:PythonNOASSERTION90900

vid-TLDR

Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".

Language:PythonMIT2500

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1083700

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonApache-2.049100

InternVideo2

MIT18300

FAVDBench

[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description

Language:PythonApache-2.07200

mPLUG-Owl

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Language:PythonMIT203300

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonMIT439400

Awesome-Parameter-Efficient-Transfer-Learning

Collection of awesome parameter-efficient fine-tuning resources.

41400

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonApache-2.071400

all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

Language:Python27500

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

100900

MCQ

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Language:Python13500

DTL

This repository is the official implementation of "DTL: Disentangled Transfer Learning for Visual Recognition", which is accepted by AAAI 2024.

Language:PythonMIT2300

Ant-Multi-Modal-Framework

Research Code for Multimodal-Cognition Team in Ant Group

Language:PythonCC-BY-4.07200

awesome-video-text-retrieval

A curated list of deep learning resources for video-text retrieval.

56800

CLIP_benchmark

CLIP-like model evaluation

Language:Jupyter NotebookMIT54500

CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Language:PythonMIT82100

MMVP

Language:Python25600

Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Language:PythonMIT22000