There are 0 repository under vision-language-learning topic.
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"
Socratic models for multimodal reasoning & image captioning
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition