multi-modal-learning

There are 15 repositories under multi-modal-learning topic.

mlfoundations / open_clip
An open source implementation of CLIP.
computer-vision contrastive-loss deep-learning language-model multi-modal-learning pretrained-models pytorch zero-shot-classification
Language:Python 10365
OFA-Sys / Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language
Language:Python 4562
lyuchenyang / Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks
Language:Python 1558
NVlabs / prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
image-captioning language-model multi-modal-learning multi-task-learning vision-and-language vision-language-model vqa
Language:Python 1299
lucidrains / x-clip
A concise but complete implementation of CLIP with various experimental improvements from recent papers
artificial-intelligence contrastive-learning deep-learning multi-modal-learning zero-shot-learning
Language:Python 693
jokieleung / awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
attention-networks awesome-list multi-modal multi-modal-learning vqa
659
OpenRobotLab / EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
3d-vision computer-vision multi-modal-learning robotics
Language:Python 490
zeta
kyegomez / zeta
Build high-performance AI models with modular building blocks
artificial-intelligence deep-learning gpt4 llama2 longnet multi-agent-systems multi-modal multi-modal-learning multi-platform pytorch speech-recognition transformer transformers
Language:Python 424
DmitryRyumin / CVPR-2023-24-Papers
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
action-recognition autonomous-driving biometrics computer-vision cvpr cvpr2023 datasets deep-learning face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition scene-analysis segmentation self-supervised-learning shape-analysis video-synthesis cvpr2024
Language:Python 420
zjukg / KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
awsome-list cross-modal-retrieval entity-alignment entity-linking image-classification image-generation information-extraction knowledge-graph knowledge-graph-embeddings large-language-models multi-modal-fusion multi-modal-knowledge-graph multi-modal-learning paper-list survey surveys visual-question-answering awsome
339
zhengli97 / PromptKD
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
cvpr2024 multi-modal-learning prompt-learning vision-language-model knowledge-distillation clip
Language:Python 237
Ysz2022 / NeRCo
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
iccv iccv2023 low-light-image low-light-image-enhancement multi-modal-learning neural-representation
Language:Python 222
moabarar / nemar
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
multimodal registartion stn image-to-image-translation multi-modal multi-modal-learning affine-transformation deformable-transformation deep-learning cnn pytorch image-registration cvpr2020 multimodal-image-registration
Language:Python 167
huggingface / chug
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
computer-vision dataloading datasets distributed-training document-understanding multi-modal-learning pdf-document webdataset
Language:Python 151
GuanRunwei / Achelous
The official repository of Achelous and Achelous++
4d-mmwave-radar multi-modal-fusion multi-modal-learning multi-task-learning object-detection object-tracking panoptic-perception point-cloud-segmentation semantic-segmentation
Language:Python 145
qizekun / ReCon
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
3d-point-clouds multi-modal-learning representation-learning self-supervised-learning
Language:Python 129
wjun0830 / CGDETR
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
computer-vision detection-transformer detr highlight-detection moment-retrieval multi-modal-learning pytorch temporal-grounding text-video-retrieval video-grounding video-summarization video-understanding
Language:Python 116
shikras / d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
dataset multi-modal-learning object-detection open-vocabulary-detection referring-expression-comprehension vision-language
Language:Python 108
924973292 / EDITOR
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
cvpr2024 msvr310 multi-modal-learning person-reid reid rgbnt100 rgbnt201 vehicle-reidentification frequency-analysis token-selection multi-modal
Language:Python 86
WillDreamer / Aurora
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
multi-modal-learning parameter-efficient-tuning
Language:Python 83
likyoo / Multimodal-Remote-Sensing-Toolkit
A python tool to perform deep learning experiments on multimodal remote sensing data.
python pytorch remote-sensing multi-modal-learning
Language:Python 82
HyperDenseNet_pytorch
josedolz / HyperDenseNet_pytorch
Pytorch version of the HyperDenseNet deep neural network for multi-modal image segmentation
hyperdensenet deep-learning 3d-convolutional-network 3d-cnn medical-image-processing medical-image-segmentation multi-modal-imaging multi-modal-learning segmentation image-segmentation pytorch pytorch-cnn
Language:Python 78
RL4M / MRM-pytorch
An official implementation of Advancing Radiograph Representation Learning with Masked Record Modeling (ICLR'23)
chest-xray-images multi-modal-learning pre-trained-model representation-learning self-supervised-learning
Language:Python 74
RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
benchmark vision-and-language deep-learning multi-modal-learning pytorch
Language:Python 70
rinnakk / japanese-clip
Japanese CLIP by rinna Co., Ltd.
clip cloob japanese language-model multi-modal-learning pretrained-models vision
Language:Python 68
rentainhe / TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
vqav2 iccv2021 transformer clevr multi-modal vision-and-language visual-question-answering pytorch multi-scale-features dynamic-network attention local-and-global multi-modality visualization multi-modal-learning official
Language:Python 65
ttgeng233 / UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
audio-visual-events audio-visual-learning multi-modal-learning
Language:Python 61
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
llava llm-finetuning mistral-7b multi-modal multi-modal-learning vision-language-model
Language:Python 59
zhjohnchan / awesome-vision-and-language-pretraining
A curated list of vision-and-language pre-training (VLP). :-)
multi-modal-learning pre-training vision-and-language-pre-training
56
YuanGongND / uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
computer-vision multi-modal-learning audio-classification
Language:Python 54
vishalned / MMEarth-data
This repository contains code to download data for the preprint "MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning"
computer-vision earth-observation multi-modal-learning remote-sensing representation-learning
Language:Python 50
deep-symbolic-mathematics / Multimodal-Math-Pretraining
[ICLR 2024 Spotlight] This is the official code for the paper "SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training"
ai4math ai4science deep-learning multi-modal multi-modal-learning representation-learning symbolic-math symbolic-regression transformers
Language:Python 47
liyichen-cly / MMEA
MMEA: Entity Alignment for Multi-Modal Knowledge Graphs, KSEM 2020
entity-alignment knowledge-graph multi-modal-learning
Language:Python 37
richard-peng-xia / HGCLIP
[arXiv'23] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
graph-representations multi-modal-learning vision-language-model hierarchical-image-classification
Language:Python 33
924973292 / Awesome-Multi-Modal-Object-Re-Identification
Multi-modal Object Re-identification
awesome code-list missing-modal-retrieval multi-modal-learning multi-modal-object-re-identification paper-list person-reidentification vehicle-reidentification
32
jackyjsy / SAM-SLR-v2
SAM-SLR-v2 is an improved version of SAM-SLR for sign language recognition.
graph-convolutional-networks multi-modal-learning sign-language-recognition sign-language-recognition-system
Language:Python 29