There are 6 repositories under cross-modal topic.
A collection of research on knowledge graphs
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
A curated list of different papers and datasets in various areas of audio-visual processing
Weakly Supervised 3D Object Detection from Point Clouds (VS3D), ACM MM 2020
Unofficial Implementation of Google Deepmind's paper `Objects that Sound`
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"
Code for paper "direct speech-to-image translation"
Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)
Python implementation of cross-modal hashing algorithms
[IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition
DSCNet Visible-Infrared Person ReID (TIFS 2022)
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
Code release of "Collective Deep Quantization of Efficient Cross-modal Retrieval" (AAAI 17)