large-vision-language-models

There are 5 repositories under large-vision-language-models topic.

BradyFU / Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
chain-of-thought in-context-learning instruction-following instruction-tuning large-language-models large-vision-language-model large-vision-language-models multi-modality multimodal-chain-of-thought multimodal-in-context-learning multimodal-instruction-tuning multimodal-large-language-models visual-instruction-tuning
16641
zhaochen0110 / Awesome_Think_With_Images
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
large-vision-language-models multimodal-reasoning-visual-reasoning survey-awesome-list thinking-with-images
1093
ShareGPT4Omni / ShareGPT4Video
[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"
chatgpt gpt gpt-4v large-language-models large-multimodal-models large-vision-language-models large-video-language-models sora text-to-video
Language:Python 1078
NVlabs / DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
commonsense-reasoning deep-learning deep-neural-networks instruction-tuning large-language-models large-vision-language-models lora parameter-efficient-fine-tuning parameter-efficient-tuning vision-and-language
Language:Python 876
MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
large-language-models large-vision-language-models mme multimodal-large-language-models video video-mme
679
YingqingHe / Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-music text-to-sound text-to-speech text-to-video llm lvlm mllm
Language:HTML 514
Paranioar / Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
cross-modal-retrieval tutorial awesome-list image-text-matching image-text-retrieval large-language-models large-vision-language-models large-vision-models memory-efficient-tuning multimodal-pretraining parameter-efficient-fine-tuning video-text-recognition video-text-retrieval vision-and-language visual-semantic-embedding multimodal-large-language-models large-language-model text-to-image-generation text-to-image-synthesis text-to-video-generation
433
burglarhobbit / Awesome-Medical-Large-Language-Models
Curated papers on Large Language Models in Healthcare and Medical domain
large-language-models multimodal-large-language-models large-vision-language-models
367
tianyi-lab / HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
benchmark vlms gpt-4 gpt-4v llava benchmarks hallucination llm lmm large-language-models large-vision-language-models
Language:Python 310
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
chatgpt gpt gpt-4v gpt4v instruction-tuning language-model large-language-models large-multimodal-models large-vision-language-models vision-language-model eccv2024
Language:Python 240
khuangaf / Awesome-Chart-Understanding
A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
awesome-list chart-captioning chart-question-answering chart-summarization chart-understanding large-vision-language-models
222
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
evaluation large-language-models large-multimodal-models large-vision-language-model large-vision-language-models llm llms lvlm lvlms multimodal multimodal-learning multimodality visual-question-answering
Language:Python 196
NishilBalar / Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
hallucination hallucination-benchmark hallucination-detection hallucination-evaluation hallucination-mitigation hallucination-research hallucination-survey large-language-models large-vision-language-models llm lvlm mllm mlm multimodal-language-model multimodal-large-language-models vision-language-models
177
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.
large-multimodal-models large-vision-language-models mechanistic-interpretability paperlist large-language-models generative generative-model vision-models vision-foundation-model
154
mbzuai-oryx / GeoPixel
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
foundation-models grounding-llms large-multimodal-models large-vision-language-models remote-sensing segmentation-models vision-language-models
Language:Python 129
llmbev / talk2bev
Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)
autonomous-driving birds-eye-view gpt-4 large-language-models large-vision-language-models occupancy-grid-map
Language:Python 112
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
large-multimodal-models large-vision-language-model large-vision-language-models prompting vision-language-model vision-language-models visual-prompting
Language:Python 105
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
debiasing hallucination large-vision-language-models
Language:Python 82
hiyamdebary / EarthDial
[CVPR 2025 🔥] EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues.
foundation-models large-multimodal-models large-vision-language-models remote-sensing
Language:Python 78
ys-zong / VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
alignment large-language-models large-vision-language-models safety vision-language-model
Language:Python 78
Ruiyang-061X / Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
large-language-models large-vision-language-models large-vision-models mllm multi-modal uncertainty uncertainty-estimation uncertainty-quantification
54
sakura2233565548 / TabPedia
This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
deep-learning large-vision-language-models pytorch
Language:Python 46
FudanDISC / ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
gpt4 instruction-tuning large-language-models llm multimodal pre-training large-vision-language-models benchmark embodied-ai in-context-learning instruction-following multimodal-chain-of-thought visual-chain-of-thought reformulation
Language:Python 45
SuperBruceJia / Awesome-Mixture-of-Experts
Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of Multimodal Experts (MoME)
large-language-models large-vision-language-models mixture-of-experts moe multimodal-learning mixtrure-of-multimodal-experts artificial-intelligence expert-network foundation-models gating-network large-language-model llms llms-benchmarking llms-reasoning mome sparse sparse-mixture-of-experts sparse-mixture-of-multimodal-experts sparse-moe load-balancing
43
The-Martyr / CausalMM
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
attention-mechanism large-language-models large-vision-language-models multi-modality multimodal multimodal-large-language-models iclr iclr2025 llm
Language:Python 43
MSIIP / MedM-VL
MedM-VL is a modular, LLaVA-based codebase for medical LVLMs.
large-vision-language-models medical-image-analysis
Language:Python 42
SkiddieAhn / Paper-AnyAnomaly
PyTorch Implementation of the Paper 'AnyAnomaly': Official Version
anomaly-detection computer-vision large-vision-language-models video-surveilance
Language:Python 42
SuperBruceJia / Awesome-Large-Vision-Language-Model
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
foundation-models large-language-models large-vision-language-model large-vision-language-models multimodal-large-language-models vision-and-language artificial-general-intelligence artificial-intelligence computer-vision deep-learning general-artificial-intelligence machine-learning natural-language-processing
38
xuyang-liu16 / VidCom2
🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
efficient-inference large-language-model large-vision-language-models video-large-language-models
Language:Python 37
sled-group / moh
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
large-vision-language-models multimodal object-hallucination
Language:Python 31
Forensics-Bench / Forensics-Bench
[CVPR 2025] Implementation of "Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models"
forgery-detection large-vision-language-models
Language:Python 29
visresearch / LLaVA-STF
The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"
efficient-deep-learning efficient-inference large-multimodal-models large-vision-language-models llama llava token-fusion token-merging vision-token-merging
Language:Python 29
khuangaf / CHOCOLATE
Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
chart-captioning chart-summarization chart-understanding factuality faithfulness large-vision-language-models
Language:Jupyter Notebook 27
YuanheZ / LoRA-One
LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently (ICML2025 Oral)
chain-of-thought commonsense-reasoning cot icml-2025 instruction-tuning large-language-models large-vision-language-models lora parameter-efficient-fine-tuning reasoning
Language:Python 24
The-Martyr / Awesome-Modality-Priors-in-MLLMs
Latest Advances on Modality Priors in Multimodal Large Language Models
hallucination language-prior large-language-models large-vision-language-models llm modality-priors multimodal-large-language-models vision-prior
23
bowen-upenn / Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
large-language-models large-vision-language-models scene-graph scene-understanding visual-question-answering open-world zero-shot-learning multi-agent foundation-models multimodal
Language:Python 18

large-vision-language-models

BradyFU / Awesome-Multimodal-Large-Language-Models

zhaochen0110 / Awesome_Think_With_Images

ShareGPT4Omni / ShareGPT4Video

NVlabs / DoRA

MME-Benchmarks / Video-MME

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

Paranioar / Awesome_Matching_Pretraining_Transfering

burglarhobbit / Awesome-Medical-Large-Language-Models

tianyi-lab / HallusionBench

ShareGPT4Omni / ShareGPT4V

khuangaf / Awesome-Chart-Understanding

MMStar-Benchmark / MMStar

NishilBalar / Awesome-LVLM-Hallucination

itsqyh / Awesome-LMMs-Mechanistic-Interpretability

mbzuai-oryx / GeoPixel

llmbev / talk2bev

yu-rp / apiprompting

yfzhang114 / LLaVA-Align

hiyamdebary / EarthDial

ys-zong / VLGuard

Ruiyang-061X / Awesome-MLLM-Uncertainty

sakura2233565548 / TabPedia

FudanDISC / ReForm-Eval

SuperBruceJia / Awesome-Mixture-of-Experts

The-Martyr / CausalMM

MSIIP / MedM-VL

SkiddieAhn / Paper-AnyAnomaly

SuperBruceJia / Awesome-Large-Vision-Language-Model

xuyang-liu16 / VidCom2

sled-group / moh

Forensics-Bench / Forensics-Bench

visresearch / LLaVA-STF

khuangaf / CHOCOLATE

YuanheZ / LoRA-One

The-Martyr / Awesome-Modality-Priors-in-MLLMs

bowen-upenn / Multi-Agent-VQA