Sihan Chen (TXH-mercury)

TXH-mercury

Geek Repo

Company:Institute of Automation, Chinese Academy of Sciences

Location:Beijing

Github PK Tool:Github PK Tool

Sihan Chen's starred repositories

node-v0.x-archive

Moved to https://github.com/nodejs/node

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Language:PythonLicense:Apache-2.0Stargazers:6379Issues:95Issues:664

awesome-segment-anything

Tracking and collecting papers/projects/others related to Segment Anything.

Oscar

Oscar and VinVL

Language:PythonLicense:MITStargazers:1034Issues:25Issues:202

RGBD_Semantic_Segmentation_PyTorch

[ECCV 2020] PyTorch Implementation of some RGBD Semantic Segmentation models.

Language:PythonLicense:MITStargazers:284Issues:4Issues:30

VSUA-Captioning

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

Language:PythonLicense:MITStargazers:264Issues:15Issues:17

MultiModal_BigModels_Survey

[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models

VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Language:PythonLicense:MITStargazers:245Issues:9Issues:21

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language:Jupyter NotebookLicense:MITStargazers:212Issues:18Issues:24

resume

基于LaTeX编译生成的中英文个人简历

Language:TeXLicense:MITStargazers:195Issues:4Issues:0

2dtan

An optimized re-implementation for 2D-TAN: Learning 2D Temporal Localization Networks for Moment Localization with Natural Language (AAAI'2020).

Language:PythonLicense:NOASSERTIONStargazers:119Issues:10Issues:23

ChatBridge

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

Language:PythonLicense:BSD-3-ClauseStargazers:42Issues:2Issues:7

COSA

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Language:PythonLicense:MITStargazers:37Issues:2Issues:4

GVL

Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Language:PythonLicense:MITStargazers:25Issues:2Issues:7

OPT_Questioner

Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"

Language:PythonLicense:MITStargazers:13Issues:0Issues:0

DANet

Dual Attention Network for Scene Segmentation (CVPR2019)

Language:PythonLicense:MITStargazers:2Issues:1Issues:0

longterm_datasets

Official repository of the paper "Are current long-term video understanding datasets long-term?", published in CVEU 2023.

Language:HTMLLicense:GPL-3.0Stargazers:1Issues:0Issues:0