TXH-mercury

followers

following

stars

Institute of Automation, Chinese Academy of Sciences

Beijing

Sihan Chen's starred repositories

node-v0.x-archive

Moved to https://github.com/nodejs/node

34478 2045 6379

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Language:PythonApache-2.06379 95 664

awesome-segment-anything

Tracking and collecting papers/projects/others related to Segment Anything.

Oscar

Oscar and VinVL

Language:PythonMIT1034 25 202

R-Drop

Language:Python864 5 32

Awesome-CV-Foundational-Models

RGBD_Semantic_Segmentation_PyTorch

[ECCV 2020] PyTorch Implementation of some RGBD Semantic Segmentation models.

Language:PythonMIT284 4 30

VSUA-Captioning

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

Language:PythonMIT264 15 17

MultiModal_BigModels_Survey

[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models

VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Language:PythonMIT245 9 21

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language:Jupyter NotebookMIT212 18 24

resume

基于LaTeX编译生成的中英文个人简历

Language:TeXMIT195 40

DDP

Language:Python154 9 15

2dtan

An optimized re-implementation for 2D-TAN: Learning 2D Temporal Localization Networks for Moment Localization with Natural Language (AAAI'2020).

Language:Python123 7 14

DyCo3D

Language:PythonNOASSERTION119 10 23

ChatBridge

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

Language:PythonBSD-3-Clause42 2 7

COSA

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Language:PythonMIT37 2 4

MOSO

Language:Python32 1 5

GVL

Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Language:PythonMIT25 2 7

OPT_Questioner

Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"

Language:PythonMIT1300

DANet

Dual Attention Network for Scene Segmentation (CVPR2019)

Language:PythonMIT2 10

longterm_datasets

Official repository of the paper "Are current long-term video understanding datasets long-term?", published in CVEU 2023.

Language:HTMLGPL-3.0100