There are 0 repository under lmm topic.
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手,整合多种顶级 AI 模型,支持多语言、多模态交流、图片生成、联网搜索和深度思考
😎 curated list of awesome LMM hallucinations papers, methods & resources.
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
LLaVA inference with multiple images at once for cross-image analysis.
[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Make Large Multimodal Models excel in object detection, ICCV 2025
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.
LMM solved catastrophic forgetting, AAAI2025
[ACL 2025 🔥] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
R Codes and Datasets for Generalized Linear Mixed Models: Modern Concepts, Methods and Applications by Walter W. Stroup
A Mathematica paclet for analyzing and deriving Runge–Kutta, linear multistep, and general linear methods
Use Gemini to auto-label images for use with Autodistill.
A Novel Method to Visualize Multimodal AI Sentiment Arcs in Long-Form Narratives