There are 26 repositories under multimodal-learning topic.
Reading list for research topics in multimodal machine learning
An open-source framework for training large multimodal models.
A curated list of Multimodal Related Research.
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
A Comparative Framework for Multimodal Recommender Systems
Papers, code and datasets about deep learning and multi-modal learning for video analysis
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Research Trends in LLM-guided Multimodal Learning.
A collection of resources on applications of multi-modal learning in medical imaging.
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
List of academic resources on Multimodal ML for Music
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
A curated list of self-supervised multimodal learning resources.
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23