OpenGVLab

OpenGVLab's repositories

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language:PythonMIT9444 65 1079

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.02099 25 281

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Language:PythonMIT867 15 91

ScaleCUA

ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

Language:PythonApache-2.080400

VideoChat-Flash

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Language:PythonMIT478 20 72

OmniCorpus

[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Language:Python401 12 11

PonderV2

[T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

Language:PythonMIT363 19 30

EfficientQAT

[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Language:Python308 5 29

VideoChat-R1

[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning

Language:Python220 8 34

EgoVideo

[CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024

Language:Jupyter Notebook131 1 20

[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 212 apps, and 1.4K app combos.

Language:Python131 3 18

PIIP

[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)

Language:PythonMIT105 6 5

ZeroGUI

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Language:PythonApache-2.0100 1 14

Mono-InternVL

[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Language:PythonMIT91 2 8

VeBrain

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

MIT85 2 4

NaViL

Language:PythonMIT8300

MUTR

「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation

Language:PythonMIT82 2 8

EgoExoLearn

[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset

Language:PythonMIT70 1 10

SDLM

Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-cache compatibility, achieving high efficiency and throughput.

Language:PythonMIT6804