Introduction

LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, a cutting-edge deep learning model dedicated to generating lip-synced videos. It functions by carefully processing an input audio clip alongside a reference video featuring a speaker. This process utilizes the advanced face restoration capabilities of state-of-the-art models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.

Face Restoration Empowered by CodeFormer or GFPGAN:
- Streamlined inference through the elimination of redundant processes.
- Enhanced efficiency with multi-threading implemented for the majority of preprocessing steps.
Unrestricted Video Compatibility:
- The limitation of requiring a face in every frame of the video has been lifted, allowing for greater versatility.
Enhanced face detection using Mediapipe:
- masks generated using facial landmarks, leading to superior pasting results.
- Facial landmarks are meticulously stored as npy files, conserving processing resources when utilizing the same video repeatedly.
Effortless Setup:
- With the exception of manual CUDA installation, the setup process is remarkably seamless, as outlined below.

👓 INSTALLATION

🥎 TRIAL

Tip

Use GPU runtime for faster processing.

💿 SETUP AND INFERENCE

Clone this repository:

git clone https://github.com/pawansharmaaaa/Lip_Wise

Install Python > 3.10 from Official Site or From Microsoft store.

Install winget from Microsoft Store.

Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.

Run setup.bat

Run launch.bat

Clone this repository:

git clone https://github.com/pawansharmaaaa/Lip_Wise

Make sure python --version is >3.10

Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.

Make setup.sh an executable

chmod +x ./setup.sh

Run setup.sh by double clicking on it.

Make launch.sh an executable

chmod +x ./launch.sh

Run launch.sh by double clicking on it.

📝 TO-DO List:

URGENT REQUIREMENTS

PREPROCESS

Add directory check in inference in the beginning.
Make preprocessing optimal.
Clear ram after no_face_filter.
Make face coordinates reusable:
- Saving facial coordinates as .npy file.
- Alter code to also include eye coordinates.

IMPROVING GAN UPSCALING

Merge Data Pipeline with preprocessor:
- Remove need to recrop, realign and rewarp the image.

IMPROVING WAV2LIP

Merge all data Pipeline:
- Remove the need to recrop, realign, renormalizing etc.
- Devise a way to keep frames without face in the video.
  - Understand Mels and working of wav2lip model.

OPTIONAL

Gradio UI
- A tab for Video, Audio and Output.
- A tab for Image, Audio and output.

FURTHER IMPROVEMENTS

Inference without restorer
Model Improvement
Implement no_face_filter too

FUTURE PLANS

Face and Audio wise Lipsync using face recognition.
A separate tab for TTS.

COLAB NOTEBOOK

Optimize Inference.
Implement Checks.

🤗 ACKNOWLEDGEMENTS:

Thanks to the following open-source projects:

About

LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, this also utilizes models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.

Apache License 2.0

Languages

Language:Jupyter Notebook 77.8%Language:Python 21.8%Language:Shell 0.3%Language:Batchfile 0.2%