wenjiajia123's starred repositories

QVLM

[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.

Language:PythonLicense:Apache-2.0Stargazers:28Issues:0Issues:0

VISA

[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model

Language:PythonStargazers:114Issues:0Issues:0

VoCo-LLaMA

VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Language:PythonLicense:Apache-2.0Stargazers:76Issues:0Issues:0

DPMesh

The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery", CVPR 2024

Language:PythonLicense:MITStargazers:32Issues:0Issues:0

FlowIE

This repository contains the official implementation of "FlowIE: Efficient Image Enhancement via Rectified Flow"

Language:PythonLicense:MITStargazers:80Issues:0Issues:0
Language:PythonStargazers:21Issues:0Issues:0

UVCOM

[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Language:PythonLicense:MITStargazers:69Issues:0Issues:0

MotionLCM

[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model"

Language:PythonLicense:NOASSERTIONStargazers:225Issues:0Issues:0

RSBuilding

This is the pytorch implement of our paper "RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model"

Language:PythonLicense:Apache-2.0Stargazers:101Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:12133Issues:0Issues:0

SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

Language:PythonLicense:BSD-3-ClauseStargazers:177Issues:0Issues:0

moment_detr

[NeurIPS 2021] Moment-DETR code and QVHighlights dataset

Language:PythonLicense:MITStargazers:263Issues:0Issues:0

UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding

Language:PythonLicense:MITStargazers:316Issues:0Issues:0

GVL

Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

License:MITStargazers:1Issues:0Issues:0
Language:PythonStargazers:5Issues:0Issues:0

ChatVID

Chat about anything on any video!

Language:PythonLicense:MITStargazers:34Issues:0Issues:0

WSAG

[EMNLP'22] Weakly-Supervised Temporal Article Grounding

Language:PythonStargazers:14Issues:0Issues:0

mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Language:PythonLicense:Apache-2.0Stargazers:4222Issues:0Issues:0