jychoi118 / ilvr_adm

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can this model be applied to other data modalities?

ZDstandup opened this issue · comments

Hi, authors. Can this model be applied to other data modalities? such as audio, text,...
Have you tried it?
Hope that you can give me some suggestions. Thanks in advance!

Directly using our method to those modalities might be difficult. Low frequency audio or masked texts may not contain semantics like low-resolution images. Meanwhile, guided-tts uses additional classifier for guiding text-to-speech models.