secutron / Easy-Wav2Lip

Colab for making Wav2Lip high quality and easy to use

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Something broke! Click the colab link for a fixed version! Unfortunately, images don't seem to be working, just videos.

Audio to video lipsyncing made easy!

Colab link: https://colab.research.google.com/github/anothermartz/Easy-Wav2Lip/blob/v4/Easy_Wav2Lip_V4_fixed.ipynb

Open In Colab

Other projects called on (probably more than this but here's what I'm aware of):
https://github.com/sczhou/CodeFormer/blob/master/README.md
https://github.com/TencentARC/GFPGAN/blob/master/README.md
https://github.com/xinntao/Real-ESRGAN

Not only was this built on the shoulders of giants, I'm not even very good at coding and I practically used Bing AI chat to do it all for me.

However I may offer some support in this discord:
Invite link: https://discord.gg/FNZR9ETwKY
Wav2Lip channel: https://discord.com/channels/667279414681272320/1076077584330280991

Best practices:

Video files:

  • Must have a face in all frames or Wav2Lip will fail
  • Crop or mask out faces you don't want to lipsync or it'll choose randomly.
  • Use h264 .mp4 - other file types may be supported but this definitely works
  • Static images should work too, at least .jpg and .png files do.
  • Use a small file in every way (try <720p, <30 seconds, 30fps etc. - Bigger files may work but are usually the reason it fails)
  • Start with a really tiny clip just to get used to the process, only once you're familiar should you try bigger files to see if they work.

Audio files:

  • Ideally just encode it into your video file - the best results come from lining up the speech to the actions!
  • OR
  • Select your audio file separately
  • I'm not certain what filetypes are supported, at least .wav and .mp3 work.

Advanced Tweaking:

Upscaling:

  • gfpgan works best for live action videos. I personally haven't tried Animated / game / CGI / AI generated videos, so they may get better results using codeformer or ESRGAN. Let me know!

ESRGAN:

RealESRGAN_x4plus.pth is downloaded by default, but you can find many different models for ESRGAN here. Upload one of those to your Google Drive or to your colab runtime then paste the path to the file in ESRGAN_model. I've only tried a few and most didn't work. Let me know if you get somewhere.

codeformer:

  • codeformer_fidelity is a scale from detailed (0) to likeness to the original face (1)
  • A lower number will recreate the face more from scratch and while each frame will look better, it will not look good in motion.
  • A higher number will try to accurately make the same face as the input and so the detail won't be as good.
  • I suggest 0.75.

Padding:

  • If you see hard lines on the face where the generated face meets the original, increase the padding in that direction.
  • It happens most often on the chin, so increase the D value if you see that.
  • Increasing these values too much can offset the mouth or cause other weird glitchyness.
  • I've been told using minus values can help with certain videos but I personally haven't found a use case for that yet.

Batch processing:

  • Name your files with a number at the end, eg. Video1.mp4, Video2.mp4, etc. and put them all in the same folder.
  • Files will be processed in numerical order starting from the one you select. For example, if you select Video3.mp4, it will process Video3.mp4, Video4.mp4, and so on.
  • If you select numbered video files and a non-numbered audio file, it will process each video with the same audio file. Useful for making different images/videos say the same line.
  • Likewise, if you select a non-numbered video file and numbered audio files, it will use the same video for each audio file. Useful for making the same image/video say different things.

Other options:

resolution_scale:

Lower the output resolution for quicker rendering and better hiding of artifacts, at the cost of worse overall image quality.

nosmooth:

Disable face detection smoothing which may fix artifacts, I'm not aware of any downsides to this.

output_suffix

This adds a suffix to your output files so that they don't overwite your originals. Tick include_upscaler_in_suffix if you want the upscaler written at the end of the suffix, good for when you want to compare different upscaler settings or just to remember which one you used.

preview_input

Displays the input video/audio before processing so you can check to make sure you chose the correct file(s). It may only work with .mp4, I just know it didn't work on an .avi I tried. Disabling this will save a few seconds of processing time for each video.

About

Colab for making Wav2Lip high quality and easy to use


Languages

Language:Jupyter Notebook 84.7%Language:Python 13.7%Language:Cuda 1.0%Language:C++ 0.7%