LMMs-Lab

LMMs-Lab's repositories

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonMIT3276 79 165

lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Language:PythonNOASSERTION3263 6 378

open-r1-multimodal

A fork to add multimodal model training to open-r1

Language:PythonApache-2.01416 13 28

LLaVA-OneVision-1.5

Fully Open Framework for Democratized Multimodal Training

Language:PythonApache-2.060500

lmms-engine

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Language:Python47400

RelateAnything

Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.

Language:PythonApache-2.0455 9 12

LongVA

Long Context Transfer from Language to Vision

Language:PythonNOASSERTION396 7 37

multimodal-search-r1

MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.

Language:PythonApache-2.034700

EgoLife

[CVPR 2025] EgoLife: Towards Egocentric Life Assistant

Language:PythonNOASSERTION343 7 12

NEO

NEO Series: Native Vision-Language Models from First Principles

Language:PythonApache-2.022100

multimodal-sae

[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.

Language:PythonNOASSERTION159 1 4

Aero-1

Language:PythonApache-2.07800

VideoMMMU

Language:PythonNOASSERTION61 3 4

MGPO

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

5100

sae

A framework that allows you to apply Sparse AutoEncoder on any models

Language:Python4100

lean-runner

Deploying High-Performance Lean 4 Server in One Click

Language:PythonMIT900

EASI

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

Apache-2.0500

DeepEyes

Language:PythonApache-2.0300

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.0300