There are 0 repository under lvlm topic.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
📜 Paper list on decoding methods for LLMs and LVLMs
CLIP-MoE: Mixture of Experts for CLIP
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".
LEMMA: An effective and explainable way to detect multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey
Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
Code for USENIX Security 2024 paper: Moderating Illicit Online Image Promotion for Unsafe User Generated Content Games Using Large Vision-Language Models.
VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models
Novel approach that leverages LVLMs to efficiently generate high-quality synthetic VQA-NLE datasets.
A powerful Streamlit application that allows users to analyze and interact with YouTube video content through natural language questions.