lmm

There are 0 repository under lmm topic.

BAAI-Agents / Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
ai ai-agent ai-agents-framework computer-control cradle foundation-agent gcc general-computer-control generative-ai grounding large-language-models llm lmm multimodality personoid vision-language-model vlm
Language:Python 2316
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
foundation-models lmm vision-and-language vision-language-model llm-agent
Language:Python 925
NVlabs / Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia
Language:Python 892
LLaVA-VL / LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
lmm multimodal
Language:Python 379
tianyi-lab / HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
benchmark vlms gpt-4 gpt-4v llava benchmarks hallucination llm lmm large-language-models large-vision-language-models
Language:Python 309
CircleRadon / TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
connector lmm mllm token-reduction visual-projector tokenpacker
Language:Python 271
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
llm lmm video video-conversation grounding transcription video-grounding
Language:Python 259
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
fuyu language llava-llama3 lmm mantis mllm multi-image-understanding multimodal video vision vlm
Language:Python 231
Javis603 / Discord-AIBot
🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手，整合多种顶级 AI 模型，支持多语言、多模态交流、图片生成、联网搜索和深度思考
ai chatbot chatgpt claude deepseek discord discord-bot discord-js gemini llm lmm nodejs openai xai
Language:JavaScript 224
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
dpo llm lmm mllm rlhf vlm
Language:Python 185
xieyuquanxx / awesome-Large-MultiModal-Hallucination
😎 curated list of awesome LMM hallucinations papers, methods & resources.
hallucination multi-modal lmm multimodal
150
Q-Future / A-Bench
[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?
ai-generated-images evaluation lmm
139
Chenyu-Wang567 / MLLM-Tool
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
gpt4 llm lmm tool-agent
Language:Python 134
graphic-design-ai / graphist
Official Repo of Graphist
graphic-design hlg layout-generation llm lmm mllm
125
WisconsinAIVision / YoLLaVA
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
llava llm llms lmm lmms multi-modal-models personalization personalized neurips neurips2024
Language:Python 117
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
cvpr2025 foundation-models llm-agent lmm vision-and-language vision-language-model
Language:Python 89
uni-medical / GMAI-MMBench
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
benchmark gmai llm lmm medical vlm medagi
73
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
grounding grounding-llms grpo llm lmm mllm o3 rl thinking-with-image
Language:Python 67
Haochen-Wang409 / ross3d
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
3d-llms iccv iccv2025 llm lmm mllm
Language:Python 59
LLaVA-CLI-with-multiple-images
mapluisch / LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
image-concatenation image-processing inference llama2 llama2-13b llava lmm lmms pillow python python3 pytorch visual-question-answering vqa
Language:Python 51
yisuanwang / Idea23D
[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
3d agent aigc lmm
Language:Jupyter Notebook 51
360CVGroup / LMM-Det
Make Large Multimodal Models excel in object detection, ICCV 2025
lmm object-detection open-vocabulary-detection ovd
Language:Python 47
mbzuai-oryx / AIN
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.
lmm ocr remote-sensing vision-and-language vlm vqa culture multi-images
Language:HTML 47
360CVGroup / Inner-Adaptor-Architecture
LMM solved catastrophic forgetting, AAAI2025
large-multimodal-models lmm
Language:Python 44
AparicioJohan / agriutilities
Utilities for field trial analysis.
lmm plant-breeding met-analysis
Language:R 22
mbzuai-oryx / TimeTravel
[ACL 2025 🔥] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
benchmark cultural historical lmm
Language:Python 18
mbzuai-oryx / ARB
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
arabic benchmark cot lmm reasoning
Language:HTML 17
myaseen208 / StroupGLMM
R Codes and Datasets for Generalized Linear Mixed Models: Modern Concepts, Methods and Applications by Walter W. Stroup
glm glmm lm lmm
Language:R 14
jinghuazhao / R
R packages
genetics lmm imputation
Language:HTML 12
GlitchBench / Benchmark
Code and Data for GlitchBench
gpt-4 llama llama2 lmm
Language:Python 11
Thisisus7 / ING-VP
An Interactive Game-based Vision Planning benchmark
benchmark game image-text llm lmm mllm multimodal
Language:Python 11
Flavjack / inti
Tools and Statistical Procedures in Plant Science
plant-breeding lmm apps shiny agriculture plant-science cran r-package inkaverse
Language:R 8
ComputationalScienceLaboratory / Integreat
A Mathematica paclet for analyzing and deriving Runge–Kutta, linear multistep, and general linear methods
ode mathematica runge-kutta lmm glm numerical-analysis numerical-integration
Language:Mathematica 6
wtlow003 / video2article
Transform video tutorial into article!
gpt-4o llm lmm semantic-router
Language:Python 6
autodistill / autodistill-gemini
Use Gemini to auto-label images for use with Autodistill.
autodistill computer-vision gemini lmm
Language:Python 5
jon-chun / multisentimentarcs
A Novel Method to Visualize Multimodal AI Sentiment Arcs in Long-Form Narratives
llm affective-computing lmm multimodal-deep-learning narrative open-source-ai sentiment-analysis storytelling affective-ai
Language:Python 5

lmm

BAAI-Agents / Cradle

mbzuai-oryx / groundingLMM

NVlabs / Eagle

LLaVA-VL / LLaVA-Interactive-Demo

tianyi-lab / HallusionBench

CircleRadon / TokenPacker

mbzuai-oryx / Video-LLaVA

TIGER-AI-Lab / Mantis

Javis603 / Discord-AIBot

TideDra / VL-RLHF

xieyuquanxx / awesome-Large-MultiModal-Hallucination

Q-Future / A-Bench

Chenyu-Wang567 / MLLM-Tool

graphic-design-ai / graphist

WisconsinAIVision / YoLLaVA

mbzuai-oryx / VideoGLaMM

uni-medical / GMAI-MMBench

Haochen-Wang409 / TreeVGR

Haochen-Wang409 / ross3d

mapluisch / LLaVA-CLI-with-multiple-images

yisuanwang / Idea23D

360CVGroup / LMM-Det

mbzuai-oryx / AIN

360CVGroup / Inner-Adaptor-Architecture

AparicioJohan / agriutilities

mbzuai-oryx / TimeTravel

mbzuai-oryx / ARB

myaseen208 / StroupGLMM

jinghuazhao / R

GlitchBench / Benchmark

Thisisus7 / ING-VP

Flavjack / inti

ComputationalScienceLaboratory / Integreat

wtlow003 / video2article

autodistill / autodistill-gemini

jon-chun / multisentimentarcs