Important
Please Help by starring the repo. π
LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, a cutting-edge deep learning model dedicated to generating lip-synced videos. It functions by carefully processing an input audio clip alongside a reference video featuring a speaker. This process utilizes the advanced face restoration capabilities of state-of-the-art models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.
- Face Restoration Empowered by CodeFormer or GFPGAN:
- Streamlined inference through the elimination of redundant processes.
- Enhanced efficiency with multi-threading implemented for the majority of preprocessing steps.
- Unrestricted Video Compatibility:
- The limitation of requiring a face in every frame of the video has been lifted, allowing for greater versatility.
- Enhanced face detection using Mediapipe:
- masks generated using facial landmarks, leading to superior pasting results.
- Facial landmarks are meticulously stored as npy files, conserving processing resources when utilizing the same video repeatedly.
- Effortless Setup:
- With the exception of manual CUDA installation, the setup process is remarkably seamless, as outlined below.
Tip
Use GPU runtime for faster processing.
- Clone this repository:
git clone https://github.com/pawansharmaaaa/Lip_Wise
- Install
Python > 3.10
from Official Site or From Microsoft store.- Install winget from Microsoft Store.
- Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.
- Run
setup.bat
- Run
launch.bat
- Clone this repository:
git clone https://github.com/pawansharmaaaa/Lip_Wise
- Make sure
python --version
is>3.10
- Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.
- Make
setup.sh
an executable
chmod +x ./setup.sh
- Run
setup.sh
by double clicking on it.- Make
launch.sh
an executable
chmod +x ./launch.sh
- Run
launch.sh
by double clicking on it.
- setup.bat / setup.sh
- create venv
- install requirements inside venv
- CodeFormer arch initialization
- Documentation
- Add directory check in inference in the beginning.
- Make preprocessing optimal.
- Clear ram after no_face_filter.
- Make face coordinates reusable:
- Saving facial coordinates as .npy file.
- Alter code to also include eye coordinates.
- Merge Data Pipeline with preprocessor:
- Remove need to recrop, realign and rewarp the image.
- Merge all data Pipeline:
- Remove the need to recrop, realign, renormalizing etc.
- Devise a way to keep frames without face in the video.
- Understand Mels and working of wav2lip model.
- Gradio UI
- A tab for Video, Audio and Output.
- A tab for Image, Audio and output.
- Inference without restorer
- Model Improvement
- Implement no_face_filter too
- Face and Audio wise Lipsync using face recognition.
- A separate tab for TTS.
- Optimize Inference.
- Implement Checks.