invictus717 / MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Home Page:https://arxiv.org/abs/2307.10802

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple modals

HaibiaoXuan opened this issue · comments

How to use multiple modals at the same time for a task, such as text+image, text+audio, or text+pointcloud?

You can simply concatenate these multimodal embeddings and then feed them to the shared encoder.

You can simply concatenate these multimodal embeddings and then feed them to the shared encoder.

请问下是指将分类前的向量统一拼接,然后送给分类器吗?

简单拼接就行