Bark audio model and talking head additions

Question

Bark audio model and talking head additions

sarutobiumon opened this issue a year ago · comments

sarutobiumon commented a year ago

Would be amazing if you can:

Turn the "talking head" images into animated gifs lip-sync'ed to the wav audio generated by tts using Bark (Bark is currently the best and most realstic/emotion-driven audio model that is free to use, even better than the best commercial closed source model Eleven Labs)
Then generating an mp4 from the combination of animated gif and wav audio on the fly, replacing the starting-point animated gif on the screen.

This can be done by integrating code from one of the following choices:

from this one-click install GUI: https://www.youtube.com/watch?v=f_NUZDBiaZg
or using Sadtalker: https://www.youtube.com/watch?v=aJIq_UoZv24
or this google colab python code below (supports 30+ languages):
https://spltech.co.uk/using-wav2lip-and-google-cloud-wavenet-to-create-voice-overs-in-more-than-30-languages/
or using VideoReTalking：Audio-based Lip Synchronization for Talking Head Video
https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb
Demo https://www.youtube.com/watch?v=CgZVKSkdtRo

Bark oobabooga tts extention:
https://github.com/wsippel/bark_tts

Dongchao Yang · Answer 1 · Sun Apr 30 2023 21:10:39 GMT+0800 (China Standard Time)

Would be amazing if you can:

Turn the "talking head" images into animated gifs lip-sync'ed to the wav audio generated by tts using Bark (Bark is currently the best and most realstic/emotion-driven audio model that is free to use, even better than the best commercial closed source model Eleven Labs)

Then generating an mp4 from the combination of animated gif and wav audio on the fly, replacing the starting-point animated gif on the screen.

This can be done by integrating code from one of the following choices:

from this one-click install GUI: https://www.youtube.com/watch?v=f_NUZDBiaZg

or using Sadtalker: https://www.youtube.com/watch?v=aJIq_UoZv24

or this google colab python code below (supports 30+ languages):
https://spltech.co.uk/using-wav2lip-and-google-cloud-wavenet-to-create-voice-overs-in-more-than-30-languages/

or using VideoReTalking：Audio-based Lip Synchronization for Talking Head Video
https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb
Demo https://www.youtube.com/watch?v=CgZVKSkdtRo

Bark oobabooga tts extention: https://github.com/wsippel/bark_tts

Hi, Thanks for your suggestions. We will try to add these models into AudioGPT as soon as.