multimodal-learning

There are 26 repositories under multimodal-learning topic.

pliang279 / awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
multimodal-learning machine-learning representation-learning natural-language-processing computer-vision speech-processing robotics healthcare reading-list deep-learning reinforcement-learning
5521
mlfoundations / open_flamingo
An open-source framework for training large multimodal models.
computer-vision deep-learning flamingo in-context-learning language-model multimodal-learning pytorch
Language:Python 3495
Awesome-Multimodal-Research
Eurus-Holmes / Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
awesome multimodal multimodal-learning multimodal-research
Language:Python 1271
DmitryRyumin / ICCV-2023-Papers
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
iccv iccv2023 3d-graphics 3d-reconstruction biometrics computer-vision datasets deep-learning explainable-ai face-recognition gesture-recognition image-processing image-synthesis pattern-recognition robotics video-synthesis multimodal-learning photogrammetry pose-estimation transfer-learning
Language:Python 873
AILab-CVC / UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
architecture artificial-intelligence convolutional-neural-networks deep-learning multimodal-learning
Language:Python 831
PreferredAI / cornac
A Comparative Framework for Multimodal Recommender Systems
recommender-system recommendation-algorithms recommendation-engine matrix-factorization collaborative-filtering multimodal-learning recommendation-system multimodality
Language:Python 821
ArrowLuo / CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
video-text-retrieval multimodal-learning multimodality multimodal search ranking retrieval-model retrieval msrvtt lsmdc msvd activitynet didemo video-clip-retrieval clip
Language:Python 785
HuaizhengZhang / Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
deep-learning machine-learning multimodal-learning paper video-analysis video-classification video-dataset
723
declare-lab / multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
multimodal-deep-learning multimodal-learning multimodal-interactions multimodal-sentiment-analysis
Language:OpenEdge ABL 657
henghuiding / ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
multimodal-learning referring-expression-comprehension referring-expression-segmentation referring-image-segmentation vision-language-transformer cvpr2023
Language:Python 645
georgian-io / Multimodal-Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
huggingface-transformers transformer natural-language-processing tabular-data multimodal-learning
Language:Python 555
njustkmg / OMML
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
multimodal multimodal-learning python paddlepaddle pytorch crossmodal-retrieval imagecaptioning classification
Language:Python 551
subho406 / OmniNet
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
machine-learning deep-learning neural-network artificial-intelligence transformer nlp image-captioning video-recognition multitask-learning multimodal-learning
Language:Python 509
henghuiding / MeViS
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
multimodal-learning referring-expression-comprehension referring-expression-segmentation referring-video-object-segmentation video-understanding mevis-dataset mose-dataset
Language:Python 459
microsoft / XPretrain
Multi-modality pre-training
multimodal-learning pre-training multimedia computer-vision nlp
Language:Python 442
pliang279 / MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
computer-vision deep-learning healthcare machine-learning multimodal-learning natural-language-processing representation-learning robotics speech-processing
Language:HTML 432
pykale
pykale / pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
machine-learning knowledge-aware-learning computer-vision graph-analysis pytorch medical-image-analysis multimodal-learning transfer-learning domain-adaptation data-science deep-learning meta-learning python multimodal
Language:Python 427
sangminwoo / awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
awesome awesome-list vision-and-language multimodal-learning
399
richard-peng-xia / awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
medical-imaging medical-report-generation multimodal-deep-learning multimodal-learning visual-question-answering large-language-models large-multimodal-models multimodal-large-language-models
337
HenryHZY / Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
llm multimodal large-language-models multimodal-learning in-context-learning instruction-tuning multimodal-large-language-models parameter-efficient-learning parameter-efficient-tuning
333
kyegomez / CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
attention attention-is-all-you-need dalle imagegeneration multimodal multimodal-learning multimodality
Language:Python 324
mmaaz60 / mvits_for_class_agnostic_od
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
class-agnostic-detection multimodal-learning open-world-detection object-detection pytorch psuedo-labels
Language:Python 294
UCSC-VLAA / CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
contrastive-learning foundation-models multimodal-learning zero-shot-classification zero-shot-learning deep-learning neurips-2023 pytorch
Language:Python 276
ilaria-manco / multimodal-ml-music
List of academic resources on Multimodal ML for Music
academic-publications awesome-list multimodal-data multimodal-learning music-ai music-information-retrieval resources music-research multimodal-deep-learning
Language:TeX 270
MMMU-Benchmark / MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering
Language:Python 260
Pointcept / GPT4Point
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
3d-generation llm multimodal-learning
Language:Python 260
HUANGLIZI / LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
medical-image-analysis pytorch segmentation vision-language multimodal-learning
Language:Python 256
DmitryRyumin / ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
asr denoising domain-adaptation face-recognition generative-models icassp icassp2023 icassp2024 image-generation keyword-spotting language-modeling multimodal-learning music-generation self-supervised-learning semantic-segmentation signal-processing signal-restoration speech-recognition spoken-language-understanding vad
Language:Python 241
snap-research / MMVID
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
deep-learning video-generation multimodal-learning text-to-video multimodal-video-generation transformer bert
Language:Python 187
ys-zong / awesome-self-supervised-multimodal-learning
A curated list of self-supervised multimodal learning resources.
awesome-list machine-learning multimodal-learning self-supervised-learning
184
TubeDETR
antoyang / TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
hc-stvg multimodal-learning spatio-temporal-video-grounding stvg video-understanding vidstg vision-and-language visual-grounding
Language:Python 161
antoyang / VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
dense-video-captioning multimodal-learning pre-training temporal-language-grounding video-captioning video-understanding vision-and-language weakly-supervised-learning vid2seq video-chapter-generation
Language:Jupyter Notebook 151
mhw32 / multimodal-vae-public
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
machine-learning generative-models variational-autoencoder multimodal-learning
Language:Python 150
antoyang / FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
multimodal-learning video-understanding vqa weakly-supervised-learning large-language-models pre-training video-question-answering videoqa vision-and-language visual-question-answering
Language:Python 143
OFA-Sys / OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
audio computer-vision deep-learning motion multimodal-learning multitask-learning nlp pretrained-models pytorch transformers vision-and-language
Language:Python 142
YiLunLee / missing_aware_prompts
Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23
transformer computer-vision visual-recognition cvpr missing-modality multimodal-learning
Language:Python 141

multimodal-learning

pliang279 / awesome-multimodal-ml

mlfoundations / open_flamingo

Eurus-Holmes / Awesome-Multimodal-Research

DmitryRyumin / ICCV-2023-Papers

AILab-CVC / UniRepLKNet

PreferredAI / cornac

ArrowLuo / CLIP4Clip

HuaizhengZhang / Awsome-Deep-Learning-for-Video-Analysis

declare-lab / multimodal-deep-learning

henghuiding / ReLA

georgian-io / Multimodal-Toolkit

njustkmg / OMML

subho406 / OmniNet

henghuiding / MeViS

microsoft / XPretrain

pliang279 / MultiBench

pykale / pykale

sangminwoo / awesome-vision-and-language

richard-peng-xia / awesome-multimodal-in-medical-imaging

HenryHZY / Awesome-Multimodal-LLM

kyegomez / CM3Leon

mmaaz60 / mvits_for_class_agnostic_od

UCSC-VLAA / CLIPA

ilaria-manco / multimodal-ml-music

MMMU-Benchmark / MMMU

Pointcept / GPT4Point

HUANGLIZI / LViT

DmitryRyumin / ICASSP-2023-24-Papers

snap-research / MMVID

ys-zong / awesome-self-supervised-multimodal-learning

antoyang / TubeDETR

antoyang / VidChapters

mhw32 / multimodal-vae-public

antoyang / FrozenBiLM

OFA-Sys / OFASys

YiLunLee / missing_aware_prompts