CLIP for Voice
chadbrewbaker opened this issue · comments
Would it be sane to get your model to support text to audio clips like this?
One of the DALLE3 engineers has a personal project called Tortise-TTS where he has a voice version of CLIP he calls CLVP.
I think he used lucidrains CLIP as a template: https://github.com/lucidrains/DALLE-pytorch/blob/58c1e1a4fef10725a79bd45cdb5581c03e3e59e7/dalle_pytorch/dalle_pytorch.py#L272
@VoVoR and @kimihailv, what do you think about this?
Hello. It is an interesting suggestion. However, it is not our priority for now