High-quality zero-shot lipsync pipeline built on LivePortrait

Question

High-quality zero-shot lipsync pipeline built on LivePortrait

mvoodarla opened this issue 2 months ago · comments

Mokshith Voodarla commented 2 months ago

Hey folks! My team has been exploring zero-shot lipsyncing for a bit and we think we've improved on MuseTalk's quality quite a bit by using LivePortrait to neutralize expression and CodeFormer to enhance. Here's an example.

short.mp4

We wrote a technical blog on it: https://www.sievedata.com/blog/sievesync-zero-shot-lipsync-api-developers

Hope to put out an OSS repo soon too :)

Anything we don't talk about in the blog that we should in our repo release?

Ziyaad · Answer 1 · Wed Sep 18 2024 01:13:22 GMT+0800 (China Standard Time)

No Codeformer, no Stable diffusion, just Audio2Head and LivePortrait, so you wanna attach a price to this Open source software now?

This actually took me 6.5 minutes

final_video.mp4

Ziyaad · Answer 2 · Wed Sep 18 2024 01:40:10 GMT+0800 (China Standard Time)

Just another example of FREE

a6ba35dc-fc58-4108-85fd-478bf88d1241.mp4

Jianzhu Guo · Answer 3 · Wed Sep 18 2024 22:10:45 GMT+0800 (China Standard Time)

It seems a good practical mix of MuseTalk and LivePortrait 👍 @mvoodarla Will it be open-sourced lately?

Mokshith Voodarla · Answer 4 · Thu Sep 19 2024 08:23:05 GMT+0800 (China Standard Time)

Hey @ziyaad30, those generations look nice! While we plan to open source relevant parts of our code, the full system is tailored to our infrastructure and wouldn't be directly usable by most developers. Our blog details the steps to achieve this quality for those interested in replicating it.

We charge for the service to cover the significant GPU costs for inference with large Stable Diffusion models. Our pay-per-use model is more accessible than the upfront cost of purchasing hardware. We're committed to open sourcing more as we develop with open source models, but some costs will always remain due to GPU requirements.

We plan to release an OSS repo soon (see the bottom of the blog for details!).

Ziyaad · Answer 5 · Thu Sep 19 2024 21:06:45 GMT+0800 (China Standard Time)

Hey @ziyaad30, those generations look nice! While we plan to open source relevant parts of our code, the full system is tailored to our infrastructure and wouldn't be directly usable by most developers. Our blog details the steps to achieve this quality for those interested in replicating it.

We charge for the service to cover the significant GPU costs for inference with large Stable Diffusion models. Our pay-per-use model is more accessible than the upfront cost of purchasing hardware. We're committed to open sourcing more as we develop with open source models, but some costs will always remain due to GPU requirements.

We plan to release an OSS repo soon (see the bottom of the blog for details!).

If you can do that and release it, so those who have the GPU/Power to run it, and who do not have access to pay, then it'll be very good.

Mokshith Voodarla · Answer 6 · Sat Sep 28 2024 08:12:58 GMT+0800 (China Standard Time)

here it is! https://github.com/sieve-community/sievesync

Anish Kumar Raj · Answer 7 · Mon Sep 30 2024 12:29:16 GMT+0800 (China Standard Time)

Etti _singh

Anish Kumar Raj · Answer 8 · Mon Sep 30 2024 12:29:39 GMT+0800 (China Standard Time)

Etti _singh

Fantadana · Answer 9 · Thu Oct 03 2024 00:18:02 GMT+0800 (China Standard Time)

here it is! https://github.com/sieve-community/sievesync

@mvoodarla Thanks! Your model's performance is quite good. It seems that your model's main framework is still MuseTalk, I'm curious about how much impact the retargeting module has on the results. Could you provide some examples w/ and w/o retargeting to illustrate the difference?