How can the driven-audio feature a and the landmark representation l be used for cross-attention module?

Question

How can the driven-audio feature a and the landmark representation l be used for cross-attention module?

Haoqing-Wang opened this issue 10 months ago · comments

As we all know, the driven-audio feature a and the landmark representation l are just a vector, not a batch of vectors, so how can they be used in cross-attention module as Key and Value?

WoofGH · Answer 1 · Tue Apr 16 2024 19:06:56 GMT+0800 (China Standard Time)

Did you understand how this works? I'm totally confused right now😭.