dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use the modal-type embedding in the output of encoder?

leyuan-sun opened this issue · comments

How to use the modal-type embedding in the output of encoder?

Sorry, my questions is how can I use modal-type embedding to know which feature is belong to which modal in the output? Thanks in advance!!