Multiple modals
HaibiaoXuan opened this issue · comments
HaibiaoXuan commented
How to use multiple modals at the same time for a task, such as text+image, text+audio, or text+pointcloud?
Yiyuan Zhang commented
You can simply concatenate these multimodal embeddings and then feed them to the shared encoder.
Xu Hong Bo commented
You can simply concatenate these multimodal embeddings and then feed them to the shared encoder.
请问下是指将分类前的向量统一拼接,然后送给分类器吗?
Yiyuan Zhang commented
简单拼接就行