dongyh20

Yuhao Dong's repositories

Octopus

[ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.

Language:Python293 9 12

Insight-V

[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Language:Python221 10 14

Chain-of-Spot

Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models

Language:PythonApache-2.096 5 8

C2P

[CVPR2023]Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning

Language:Python17 1 1

Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

MIT000

Awesome_Think_With_Images

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

000

c2p.github.io

Project Page for C2P[CVPR2023]

Language:JavaScript010

dongyh20

010

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonNOASSERTION000

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Language:PythonApache-2.0000