Shawn J. (Arsiuuu)

Arsiuuu

Geek Repo

Location:Shanghai, China

Github PK Tool:Github PK Tool

Shawn J.'s starred repositories

Language:PythonLicense:Apache-2.0Stargazers:70Issues:0Issues:0

LaCLIP

[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"

Language:PythonLicense:BSD-2-ClauseStargazers:242Issues:0Issues:0

ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Language:PythonStargazers:1178Issues:0Issues:0

DCI

Densely Captioned Images (DCI) dataset repository.

Language:PythonLicense:NOASSERTIONStargazers:148Issues:0Issues:0
Language:PythonLicense:MITStargazers:50Issues:0Issues:0

HBI

[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Language:PythonLicense:Apache-2.0Stargazers:99Issues:0Issues:0

UCoFiA

Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)

Language:PythonLicense:MITStargazers:49Issues:0Issues:0

DreamLIP

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Language:PythonLicense:NOASSERTIONStargazers:68Issues:0Issues:0

COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

License:MITStargazers:180Issues:0Issues:0

evit

Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Language:PythonLicense:Apache-2.0Stargazers:162Issues:0Issues:0

ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Language:PythonLicense:NOASSERTIONStargazers:909Issues:0Issues:0

vid-TLDR

Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".

Language:PythonLicense:MITStargazers:25Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10837Issues:0Issues:0

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonLicense:Apache-2.0Stargazers:491Issues:0Issues:0
License:MITStargazers:183Issues:0Issues:0

FAVDBench

[CVPR 2023] Official implementation of the paper: Fine-grained Audible Video Description

Language:PythonLicense:Apache-2.0Stargazers:72Issues:0Issues:0

mPLUG-Owl

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Language:PythonLicense:MITStargazers:2033Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:4394Issues:0Issues:0

Awesome-Parameter-Efficient-Transfer-Learning

Collection of awesome parameter-efficient fine-tuning resources.

Stargazers:414Issues:0Issues:0

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:714Issues:0Issues:0

all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

Language:PythonStargazers:275Issues:0Issues:0

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

Stargazers:1009Issues:0Issues:0

MCQ

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Language:PythonStargazers:135Issues:0Issues:0

DTL

This repository is the official implementation of "DTL: Disentangled Transfer Learning for Visual Recognition", which is accepted by AAAI 2024.

Language:PythonLicense:MITStargazers:23Issues:0Issues:0

Ant-Multi-Modal-Framework

Research Code for Multimodal-Cognition Team in Ant Group

Language:PythonLicense:CC-BY-4.0Stargazers:72Issues:0Issues:0

awesome-video-text-retrieval

A curated list of deep learning resources for video-text retrieval.

Stargazers:568Issues:0Issues:0

CLIP_benchmark

CLIP-like model evaluation

Language:Jupyter NotebookLicense:MITStargazers:545Issues:0Issues:0

CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Language:PythonLicense:MITStargazers:821Issues:0Issues:0
Language:PythonStargazers:256Issues:0Issues:0

Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Language:PythonLicense:MITStargazers:220Issues:0Issues:0