vision-language-models

There are 0 repository under vision-language-models topic.

baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
instruction-following large-language-models mllm multimodal-large-language-models vlm encoder-free-vlm llm clip vision-language-models
Language:Python 319
snap-research / MyVLM
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
personalization vision-language-models
Language:Python 167
BAAI-Agents / GPA-LM
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
agent-framework agents ai awesome-list gameai gameplay games gcc general-computer-control generative-ai large-language-models llm multimodal planning vision-language-models vlm
141
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
image-descriptions mllm multimodal-large-language-models vision-language-models visual-perception vlm
Language:Python 137
NishilBalar / Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
hallucination hallucination-benchmark hallucination-detection hallucination-evaluation hallucination-mitigation hallucination-research hallucination-survey large-language-models large-vision-language-models llm lvlm mllm mlm multimodal-language-model multimodal-large-language-models vision-language-models
113
OpenGVLab / PIIP
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
computer-vision image-classification instance-segmentation multimodal-large-language-models object-detection semantic-segmentation vision-language-models vision-transformer
Language:Python 87
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
large-multimodal-models large-vision-language-model large-vision-language-models prompting vision-language-model vision-language-models visual-prompting
Language:Python 82
mbzuai-oryx / GeoPixel
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
foundation-models grounding-llms large-multimodal-models large-vision-language-models remote-sensing segmentation-models vision-language-models
Language:Python 72
drive-bench / toolkit
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
autonomous-driving vision-language-models driving-with-language chatgpt internvl phi-3 qwen2-vl
Language:Python 64
elkhouryk / RS-TransCLIP
[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
remote-sensing vision-language-models zero-shot-classification satellite-imagery scene-classification transductive-learning aerial-imagery image-classification earth-observation
Language:Python 60
erfanshayegani / Jailbreak-In-Pieces
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
ai-safety alignment llm vision-language-models vlm cross-modality-safety-alignment multi-modal-models
Language:Python 45
lezhang7 / SAIL
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
efficient-learning vision-language-models
Language:Jupyter Notebook 32
vanillaer / CPL-ICML2024
[ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"
pseudolabels unlabeled-data vision-language-models
Language:Python 30
D2I-Group / awesome-vision-time-series
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
large-multimodal-models large-vision-models time-series vision-language-models vision-models
Language:Python 23
jiayuww / SpatialEval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
large-language-models machine-learning multimodal-deep-learning reasoning spatial-reasoning vision-language-models claude foundation-models gemini gpt-4o gpt-4v llama3
Language:Python 22
paulgavrikov / vlm_shapebias
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
iclr2025 shape-bias steering-behaviors vision-language-models
Language:Python 21
ytaek-oh / awesome-vl-compositionality
Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.
vision-language-models vision-language-compositionality
19
ytaek-oh / fsc-clip
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
image-text-retrieval vision-language-models zero-shot-classification compositionality
Language:Python 18
danelpeng / Awesome-Continual-Leaning-with-PTMs
This is a curated list of "Continual Learning with Pretrained Models" research.
adapters continual-learning embodied-ai large-language-models mixture-of-experts parameter-efficient-tuning pretrained-models prompt-tuning awesome diffusion-models vision-language-models
16
chu0802 / SnD
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24
eccv eccv2024 continual-learning vision-language-models
Language:Python 10
auniquesun / Point-Cache
[CVPR 2025] Official implementation of the paper "Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis"
3d-domain-generalization generalizable-point-cloud-recognition large-multimodal-3d-models vision-language-models 3d-point-cloud-analysis 3d-point-cloud-recognition 3d-robustness generalizable-point-cloud-a large-3d-models robust-point-cloud-analysis robust-point-cloud-recognition cvpr2025 huggingface huggingface-datasets
Language:Jupyter Notebook 7
s-vco / s-vco
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
preference-optimization vision-language-models alignment-algorithms
Language:Python 7
sled-group / COMFORT
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities"
spatial-reasoning vision-language-models
Language:Python 7
AaltoML / BayesVLM
Code for Post-hoc Probabilistic Vision-Language Models
active-learning bayesian-deep-learning clip siglip vision-language-models zero-shot-learning
Language:Python 5
andrewliao11 / Q-Spatial-Bench-code
Official repo of the paper "Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models"
spatial-reasoning vision-language-models
Language:Python 4
sitamgithub-MSIT / PicQ
PicQ: Demo for MiniCPM-o 2.6 to answer questions about images using natural language.
gradio huggingface-spaces huggingface-transformers question-answering multilingual-models minicpm-v generative-ai python vision-language-models minicpm
Language:Python 4
akskuchi / dHM-visual-storytelling
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition – EMNLP 2024 (Findings)
evaluation natural-language-generation vision-language-models visual-storytelling
Language:Python 3
fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
conversational-agent conversational-ai conversational-bot conversational-interface internvl internvl2 multimodal multimodal-data multimodal-deep-learning multimodal-large-language-models multimodal-learning vision-language vision-language-learning vision-language-model vision-language-models vision-language-navigation vision-language-transformer generative-ai
Language:Python 3
Shengwei-Peng / TOCFL-MultiBench
TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.
benchmark chinese-language deep-learning large-language-models multimodal multimodal-large-language-models natural-language-processing vision-language-models
Language:Python 3
VectorInstitute / VLDBench
VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.
ai-safety benchmark-framework benchmarking computer-vision datasets deep-learning llm machine-learning multimodal-ai nlp vision-language-models vlms disinformation-detection
Language:JavaScript 3
zwenyu / colearn-plus
Code for Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training [IJCV 2024], Rethinking the Role of Pre-Trained Networks in Source-Free Domain Adaptation [ICCV 2023]
domain-adaptation source-free-domain-adaptation vision-language-models iccv2023 ijcv
Language:Python 3
sitamgithub-MSIT / VidiQA
VidiQA: Demo for MiniCPM-V 2.6 to answer questions about videos using natural language.
gradio gradio-interface huggingface-spaces huggingface-transformers minicpm-v multilingual-models question-answering python generative-ai vision-language-models
Language:Python 2
XiangshengGu / ActionVLM
This project explores the use of large foundational vision-language models in reinforcement learning, where the models function as agents, reward functions, or reward function code generators in unseen environments given a state and a goal.
reinforcementlearning vision-language-models
2
Alchemist-Aloha / screengpt
ScreenGPT is a project that leverages LLM to understand the screen content. It provides response based on the user defined prompts and the screen content. You need an OpenAI compatible API key to use this software.
gpt large-language-model llm openai-api screen-capture screenshot vision-language-models
Language:Python 0
amazon-science / THRONE
Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.
benchmark cvpr2024 hallucination hallucination-evaluation hallucinations large-language-model large-language-models large-vision-language-model large-vision-language-models vision-language-model vision-language-models
Language:Python
HeathSun / AdaSeg4MR
An innovative mixed reality (MR) pipeline that integrates real-time instance segmentation and speech-guided natural language interaction. It aims to create a more intuitive and immersive experience for users interacting with virtual and real-world environments.
computer-vision groq-api instance-segmentation large-language-models llama mixed-reality natural-language-processing vision-language-models ai-agent text-to-speech voice-control
Language:Python

vision-language-models

baaivision / EVE

snap-research / MyVLM

BAAI-Agents / GPA-LM

baaivision / DenseFusion

NishilBalar / Awesome-LVLM-Hallucination

OpenGVLab / PIIP

yu-rp / apiprompting

mbzuai-oryx / GeoPixel

drive-bench / toolkit

elkhouryk / RS-TransCLIP

erfanshayegani / Jailbreak-In-Pieces

lezhang7 / SAIL

vanillaer / CPL-ICML2024

D2I-Group / awesome-vision-time-series

jiayuww / SpatialEval

paulgavrikov / vlm_shapebias

ytaek-oh / awesome-vl-compositionality

ytaek-oh / fsc-clip

danelpeng / Awesome-Continual-Leaning-with-PTMs

chu0802 / SnD

auniquesun / Point-Cache

s-vco / s-vco

sled-group / COMFORT

AaltoML / BayesVLM

andrewliao11 / Q-Spatial-Bench-code

sitamgithub-MSIT / PicQ

akskuchi / dHM-visual-storytelling

fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Shengwei-Peng / TOCFL-MultiBench

VectorInstitute / VLDBench

zwenyu / colearn-plus

sitamgithub-MSIT / VidiQA

XiangshengGu / ActionVLM

Alchemist-Aloha / screengpt

amazon-science / THRONE

HeathSun / AdaSeg4MR