vision-language-learning

There are 0 repository under vision-language-learning topic.

AIDC-AI / Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
chatbot llama3 multimodal multimodal-large-language-models multimodality qwen vision-language-learning vision-language-model
Language:Python 799
shikiw / OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
chatbot chatgpt gpt-4 large-multimodal-models llama multimodal vision-language-learning vision-language-model
Language:Python 316
RLHF-V / RLAIF-V
[CVPR'25] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
chatbot cvpr2025 gpt-4v llava llava-next minicpm-v multimodal rlaif-v vision-language-learning
Language:Python 307
shikiw / Modality-Integration-Rate
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
chatbot gpt-4o large-multimodal-models llama llava multimodal vision-language-learning vision-language-model
Language:Python 96
YunzeMan / Situation3D
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
3d-scene-understanding deep-learning multi-modal-learning multimodal-learning vision-language-learning vision-language-model
Language:Python 36
LooperXX / ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
multi-modal-learning vision-language vision-language-learning vision-language-pretraining
Language:Python 11
SHTUPLUS / GITM-MR
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
vision-and-language vision-language-model vision-and-language-pre-training vision-language-dataset vision-language-learning
Language:Python 6
yubin1219 / CrossVLT
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)
pytorch referring-image-segmentation vision-language-learning
Language:Python 5
lyuchenyang / Dialogue-to-Video-Retrieval
Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"
deep-learning machine-learning multimedia neural-networks video-retrieval vision-language-learning
Language:Python 3
fork123aniket / Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
agentic-ai agentic-rag agentic-workflow generative-ai generative-ai-model internvl2 multimodal multimodal-data multimodal-deep-learning multimodal-large-language-models multimodal-learning story-generation vision-language vision-language-learning vision-language-model vision-language-transformer
Language:Python 2
fork123aniket / Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
conversational-agent conversational-ai conversational-bot conversational-interface generative-ai internvl internvl2 multimodal multimodal-data multimodal-deep-learning multimodal-large-language-models multimodal-learning vision-language vision-language-learning vision-language-model vision-language-models vision-language-navigation vision-language-transformer
Language:Python 2
abhinav-neil / socratic-models
Socratic models for multimodal reasoning & image captioning
chain-of-thought clip flan-t5 gpt-3 image-captioning multimodal-learning vision-language-learning visual-question-answering
Language:Jupyter Notebook 1
Ravi-Teja-konda / TunedLlavaDelights
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
chatgpt dalle2 dessert finetuning gpt4 gpt4v llama2 llava multi-modality multimodal nutrition nutrition-information stable-diffusion tranformers vision-language-learning vision-language-model
Language:Python 1