multimodal-deep-learning

There are 40 repositories under multimodal-deep-learning topic.

salesforce / LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
deep-learning deep-learning-library image-captioning multimodal-datasets multimodal-deep-learning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering
Language:Jupyter Notebook 8722
BentoML
bentoml / BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
ai ai-infra bentoml deep-learning generative-ai inference-api kubernetes llmops lmops machine-learning microservices ml-platform mlops model-deployment model-inference model-management model-serving multimodal-deep-learning
Language:Python 6535
Awesome-Text-to-Image
Yutong-Zhou-cv / Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
generative-adversarial-network text-to-image image-synthesis image-generation survey awseome-list image-manipulation text-to-face multimodal multimodal-deep-learning
1870
kyegomez / BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning
Language:Python 1296
pytorch-widedeep
jrzaurin / pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
pytorch tabular-data text images multimodal-deep-learning pytorch-tabular-data pytorch-nlp pytorch-cv pytorch-transformers deep-learning model-hub python
Language:Python 1234
DWCTOD / CVPR2024-Papers-with-Code-Demo
收集 CVPR 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
cvpr2021 cvpr computer-vision cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation
1166
yuewang-cuhk / awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
vision-and-language pretraining multimodal-deep-learning bert vl-ptms
1124
TheShadow29 / awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
computer-vision natural-language-processing grounding awesome-list papers arxiv visual-grounding image-grounding video-understanding video-grounding captioning-images captioning-videos embodied-agent multimodal-deep-learning language-grounding paper paper-roadmap phrase-grounding
953
AlibabaResearch / AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence documentai multimodal multimodal-deep-learning ocr computer-vision vision-language-transformer end-to-end-ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language document document-analysis document-recognition document-understanding document-intelligence vision-language-model
Language:C++ 928
KimMeen / Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
cross-modal-learning cross-modality deep-learning language-model large-language-models machine-learning multimodal-deep-learning multimodal-time-series prompt-tuning time-series time-series-analysis time-series-forecast time-series-forecasting
Language:Python 720
declare-lab / multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
multimodal-deep-learning multimodal-learning multimodal-interactions multimodal-sentiment-analysis
Language:OpenEdge ABL 646
blended-latent-diffusion
omriav / blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
deep-learning multimodal multimodal-deep-learning text-guided-manipulation text-to-image text-to-image-synthesis computer-vision diffusion diffusion-models generative-model image-generation pytorch text-driven-editing
Language:Jupyter Notebook 509
jianghaojun / Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
computer-vision deep-learning machine-learning multimodal-deep-learning parameter-efficient-learning parameter-efficient-tuning transfer-learning
337
richard-peng-xia / awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
large-language-models large-multimodal-models medical-imaging medical-report-generation multimodal-deep-learning multimodal-large-language-models multimodal-learning visual-question-answering
316
theislab / scarches
Reference mapping for single-cell genomics
single-cell deep-learning rna-seq-analysis data-integration batch-correction multimodal-deep-learning multiomics single-cell-genomics scrna-seq human-cell-atlas
Language:Jupyter Notebook 310
DWCTOD / ECCV2022-Papers-with-Code-Demo
收集 ECCV 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！
ai computer-vision cv eccv eccv2022 nerf dataset diffusion face-recognition image-segmentation multimodal-deep-learning objection-detection vision-transformer
282
phellonchen / awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
multimodal-deep-learning pretraining vision-and-language vision-and-language-pre-training vlp
277
fcakyon / content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
content-moderation content-ratings genre-classification movie-trailer nudity-detection profanity-detection violence-detection multimodal-deep-learning movie-content-filter nsfw-recognition
276
MUStARD
soujanyaporia / MUStARD
Multimodal Sarcasm Detection Dataset
sarcasm-detection sarcasm multimodal-interactions multimodal-deep-learning
Language:OpenEdge ABL 274
ilaria-manco / multimodal-ml-music
List of academic resources on Multimodal ML for Music
academic-publications awesome-list multimodal-data multimodal-learning music-ai music-information-retrieval resources music-research multimodal-deep-learning
Language:TeX 269
Yutong-Zhou-cv / Awesome-Multimodality
A Survey on multimodal learning research.
multimodality multimodal-deep-learning awesome-list
269
MILVLG / prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
a-okvqa gpt-3 multimodal-deep-learning okvqa prompt-engineering pytorch visual-question-answering
Language:Python 260
MMMU-Benchmark / MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering
Language:Python 246
david-yoon / multimodal-speech-emotion
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
speech-emotion-recognition paralinguistics multimodal-deep-learning
Language:Jupyter Notebook 245
declare-lab / awesome-emotion-recognition-in-conversations
A comprehensive reading list for Emotion Recognition in Conversations
emotion-recognition dialogue-systems natural-language-processing multimodal-deep-learning multimodal-interactions emotion-recognition-in-conversation conversational-ai
242
westlake-repl / Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
Paper List of Pre-trained Foundation Recommender Models
chatgpt foundation-model llm llm4rec multimodal pre-training recommender-system transfer-learning chatgpt4rec cross-domainrecommendation gpt4rec chatgpt3 language-model multimodal-deep-learning recommendation-system transferable multimodalrecommendation cross-domain-recommendation large-language-model llm-recommendation
227
kyegomez / Med-PaLM
Towards Generalist Biomedical AI
biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource
Language:Python 216
drprojects / DeepViewAgg
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
cvpr deep-learning image multimodal multimodal-deep-learning point-cloud pytorch semantic-segmentation cvpr2022 multi-view kitti-360 pytorch-geometric s3dis torch-points3d attention point-cloud-segmentation
Language:Python 215
sail-sg / CLoT
Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation" (CVPR 2024)
association humor-generation large-language-models multimodal-deep-learning leap-of-thought
Language:Python 213
YuanGongND / cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
audio audio-processing computer-vision multimodal multimodal-deep-learning
Language:Python 201
CapDec
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
captioning clip gpt-2 multimodal-deep-learning zero-shot-learning clipcap
Language:Python 169
vijayvee / video-captioning
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
video-captioning tensorflow s2vt sequence-to-sequence multimodal-deep-learning seq2seq
Language:Python 165
AnkurDeria / MFT
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
deep-learning hsi-classification multimodal-datasets multimodal-deep-learning remote-sensing transformer-models hyperspectral-image-classification
Language:Jupyter Notebook 149
florencejt / fusilli
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
attention-mechanism cnn data-fusion graph-neural-network imaging machine-learning multi-view multi-view-learning multimodal multimodal-deep-learning multimodality multivariate-analysis pytorch pytorch-lightning variational-autoencoder
Language:Python 146
geoaigroup / awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
awesome awesome-list earth-observation multimodal-deep-learning remote-sensing vision-and-language
142
kyegomez / the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
agora artficial-intelligence autogpt chatgpt prompt-engineering tree-of-thoughts chain-of-thought deep-learning deep-learning-algorithms multi-modal-fusion multi-modality multimodal-deep-learning reinforcement-learning
Language:Python 140

multimodal-deep-learning

salesforce / LAVIS

bentoml / BentoML

Yutong-Zhou-cv / Awesome-Text-to-Image

kyegomez / BitNet

jrzaurin / pytorch-widedeep

DWCTOD / CVPR2024-Papers-with-Code-Demo

yuewang-cuhk / awesome-vision-language-pretraining-papers

TheShadow29 / awesome-grounding

AlibabaResearch / AdvancedLiterateMachinery

KimMeen / Time-LLM

declare-lab / multimodal-deep-learning

omriav / blended-latent-diffusion

jianghaojun / Awesome-Parameter-Efficient-Transfer-Learning

richard-peng-xia / awesome-multimodal-in-medical-imaging

theislab / scarches

DWCTOD / ECCV2022-Papers-with-Code-Demo

phellonchen / awesome-Vision-and-Language-Pre-training

fcakyon / content-moderation-deep-learning

soujanyaporia / MUStARD

ilaria-manco / multimodal-ml-music

Yutong-Zhou-cv / Awesome-Multimodality

MILVLG / prophet

MMMU-Benchmark / MMMU

david-yoon / multimodal-speech-emotion

declare-lab / awesome-emotion-recognition-in-conversations

westlake-repl / Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review

kyegomez / Med-PaLM

drprojects / DeepViewAgg

sail-sg / CLoT

YuanGongND / cav-mae

DavidHuji / CapDec

vijayvee / video-captioning

AnkurDeria / MFT

florencejt / fusilli

geoaigroup / awesome-vision-language-models-for-earth-observation

kyegomez / the-compiler