There are 3 repositories under mllm topic.
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?
A Video Chat Agent with Temporal Prior
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
LVLM, LLM, MLLM, Multimodal Large Language Model, Large Language Model, Alignment, AI System, Survey
Conducting learning and research on MLLM based on the MME rankings.
🖼️Latest Papers on Visually(Imagination)-Augmented NLP
Awesome list for attacks on large language models.
🤖A list of PaperList of NLP related papers on Github