Can this model be applied to other data modalities?

Question

Can this model be applied to other data modalities?

ZDstandup opened this issue 2 years ago · comments

Hi, authors. Can this model be applied to other data modalities? such as audio, text,...
Have you tried it?
Hope that you can give me some suggestions. Thanks in advance!

Jooyoung Choi · Answer 1 · Tue Aug 09 2022 11:06:24 GMT+0800 (China Standard Time)

Directly using our method to those modalities might be difficult. Low frequency audio or masked texts may not contain semantics like low-resolution images. Meanwhile, guided-tts uses additional classifier for guiding text-to-speech models.