cross-modal

There are 6 repositories under cross-modal topic.

discoart
jina-ai / discoart
🪩 Create Disco Diffusion artworks in one line
clip-guided-diffusion creative-ai creative-art cross-modal dalle diffusion disco-diffusion discodiffusion generative-art imgen latent-diffusion midjourney multimodal prompts stable-diffusion
Language:Python 3836
docarray
docarray / docarray
Represent, send, store and search multimodal data
cross-modal data-structures dataclass deep-learning docarray elasticsearch fastapi machine-learning multi-modal multimodal nearest-neighbor-search nested-data neural-search protobuf pydantic pytorch qdrant semantic-search weaviate
Language:Python 2783
shaoxiongji / knowledge-graphs
A collection of research on knowledge graphs
knowledge-graph representation-learning relation-extraction reasoning natural-language-processing temporal-knowledge-graph ner commonsense meta-relational-learning cross-modal knowledge-graph-completion question-answering dialogue-systems recommendation-systems information-retrieval survey paper
Language:JavaScript 1620
roboflow / multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
lmm multimodality segment-anything instance-segmentation object-detection gpt-4 gpt-4-vision llava prompt-engineering visual-prompting cross-modal vision-language-model
Language:Python 959
krantiparida / awesome-audio-visual
A curated list of different papers and datasets in various areas of audio-visual processing
awesome audio-visual cross-modal mutli-modal localization source-separation awesome-list
618
kuanghuei / SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
visual-semantic cross-modal image-captioning neural-network deep-learning pytorch computer-vision
Language:Python 526
towhee-io / examples
Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.
audio-classification cross-modal embeddings image-classification machine-learning nlp video-tagging
Language:Jupyter Notebook 386
JizhiziLi / RIM
[CVPR 2023] Referring Image Matting
cross-modal image-matting image-segmentation matting multimodal
198
yisun98 / SOLC
Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类
land-use-classification pytorch remote-sensing segmentation deeplabv3 oa-kappa cross-modal multi-modal multi-source sar-optical
Language:Python 145
DRSY / MoTIS
[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)
ios-swift ai image-search clip vector-search knn lsh semantic-search random-projection knowledge-distillation retrieval cross-modal k-means k-means-clustering naacl
Language:Swift 115
Zengyi-Qin / Weakly-Supervised-3D-Object-Detection
Weakly Supervised 3D Object Detection from Point Clouds (VS3D), ACM MM 2020
3d-object-detection vs3d ws3d weakly-supervised-detection kitti point-cloud cross-modal transfer-learning object-proposals tensorflow acm-mm-2020 lidar monocular stereo unsupervised-learning unsupervised-object-detection
Language:Jupyter Notebook 100
yangli18 / VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
visual-grounding vision-language visual-linguistic cross-modal
Language:Python 85
rohitrango / objects-that-sound
Unofficial Implementation of Google Deepmind's paper `Objects that Sound`
machine-learning deep-learning audio-video embeddings deeplearning deep-neural-networks deepmind audioset cross-modal
Language:Python 82
qcraftai / distill-bev
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)
3d-object-detection bev knowledge-distillation lidar multi-camera nuscenes point-cloud self-driving autonomous-driving distillation cross-modal multi-modal
Language:Python 73
QizhiPei / BioT5
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations (EMNLP 2023)
bioinformatics computational-biology cross-modal machine-learning nlp nlp-applications
Language:Python 72
kywen1119 / DSRAN
Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.
pytorch image-text-matching tcsvt cross-modal computer-vision
Language:Python 68
haihuangcode / CMG
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
cross-modal multimodal pretrained-models cross-modal-generalization
Language:Python 66
GT-RIPL / Xmodal-Ctx
Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
clip cross-modal image-captioning vision-and-language
Language:Python 60
marslanm / Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
cross-modal multimodal-datasets multimodal-deep-learning multimodal-pre-trained-model transformer-models vision-language-pretraining multimodal-applications multimodal-pretext
60
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
cross-modal parameter-efficient-learning parameter-efficient-tuning transfer-learning memory-efficient-learning memory-efficient-tuning
Language:Python 52
Eaphan / UPIDet
Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]
3d-object-detection cross-modal multi-modal
Language:Python 48
zerovl / ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
clip cross-modal deep-learning vision-and-language
Language:Python 41
zjukg / DUET
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
pretrained-language-model pytorch transformer zero-shot-learning cross-modal grounding semantic knowledge-transfer visual-grounding
Language:Python 41
mako443 / Text2Pos-CVPR2022
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
pytorch deep-learning localization nlp language-processing cross-modal cross-modal-retrieval cross-modal-learning computer-vision cvpr cvpr2022
Language:Python 37
caoyue10 / aaai17-cdq
The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"
quantization deep-learning cross-modal similarity-search
Language:Python 35
smallflyingpig / speech-to-image-translation-without-text
Code for paper "direct speech-to-image translation"
speech-to-image gan cross-modal
Language:Python 26
catalina17 / XFlow
Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)
deep-neural-networks audiovisual-classification multimodal cross-modal keras multimedia audiovisual classification
Language:Python 25
yolo2233 / cross-modal-hasing-playground
Python implementation of cross-modal hashing algorithms
python cross-modal hashing-algorithm tensorflow
Language:Python 22
YangLiu9208 / SAKDN
[IEEE T-IP 2021] Semantics-aware Adaptive Knowledge Distillation for Cross-modal Action Recognition
domain-adaptation transfer-learning action-recognition cross-modal knowledge-distillation video-understanding
Language:Python 21
Viresh-R / ml-CCA
Implementation of Fast ml-CCA from the ICCV-2015 work "Multi-Label Cross-Modal Retrieval"
cross-modal image-to-text text-to-image canonical-correlation-analysis iccv
Language:MATLAB 19
bitreidgroup / DSCNet
DSCNet Visible-Infrared Person ReID (TIFS 2022)
computer-vision cross-modal deep-learning re-identification visible-infrared
Language:Python 18
mesnico / ALADIN
Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"
computer-vision cross-modal cross-modal-retrieval deep-learning language-and-vision natural-language-processing pytorch
Language:Python 17
ovshake / cobra
Code for COBRA: Contrastive Bi-Modal Representation Algorithm (https://arxiv.org/abs/2005.03687)
representation-learning cross-modal contrastive-learning machine-learning pytorch
Language:Python 14
PetarV- / X-CNN
Cross-modal convolutional neural networks
cross-modal convolutional-neural-networks python keras
Language:Python 11
CLT29 / semantic_neighborhoods
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
eccv2020 retrieval computer-vision cross-modal-retrieval cross-modal visual-semantic-embedding code goodnews politics mscoco-dataset mscoco coco conceptual-captions doc2vec
Language:Python 9
thuml / aaai17-cdq
Code release of "Collective Deep Quantization of Efficient Cross-modal Retrieval" (AAAI 17)
deep-learning cross-modal
Language:Python 8