haiderasad / Lip_Wise

LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, this also utilizes models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Important

Please Help by starring the repo. 😁



Introduction

LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, a cutting-edge deep learning model dedicated to generating lip-synced videos. It functions by carefully processing an input audio clip alongside a reference video featuring a speaker. This process utilizes the advanced face restoration capabilities of state-of-the-art models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.

  • Face Restoration Empowered by CodeFormer or GFPGAN:
    • Streamlined inference through the elimination of redundant processes.
    • Enhanced efficiency with multi-threading implemented for the majority of preprocessing steps.
  • Unrestricted Video Compatibility:
    • The limitation of requiring a face in every frame of the video has been lifted, allowing for greater versatility.
  • Enhanced face detection using Mediapipe:
    • masks generated using facial landmarks, leading to superior pasting results.
    • Facial landmarks are meticulously stored as npy files, conserving processing resources when utilizing the same video repeatedly.
  • Effortless Setup:
    • With the exception of manual CUDA installation, the setup process is remarkably seamless, as outlined below.

πŸ‘“ INSTALLATION

πŸ₯Ž TRIAL

Open in Google Colab

Tip

Use GPU runtime for faster processing.


πŸ’Ώ SETUP AND INFERENCE

WindowsnVIDIA

  • Clone this repository:
    • git clone https://github.com/pawansharmaaaa/Lip_Wise
  • Install Python > 3.10 from Official Site or From Microsoft store.
  • Install winget from Microsoft Store.
  • Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.
  • Run setup.bat
  • Run launch.bat

DebianUbuntuPop!_OSnVIDIA

  • Clone this repository:
    • git clone https://github.com/pawansharmaaaa/Lip_Wise
  • Make sure python --version is >3.10
  • Download and install the CUDA Toolkit that is compatible with your system. The latest version generally supports most NVIDIA 10-series graphics cards and newer models.
  • Make setup.sh an executable
    • chmod +x ./setup.sh
  • Run setup.sh by double clicking on it.
  • Make launch.sh an executable
    • chmod +x ./launch.sh
  • Run launch.sh by double clicking on it.

πŸ“ TO-DO List:

URGENT REQUIREMENTS

  • setup.bat / setup.sh
    • create venv
    • install requirements inside venv
  • CodeFormer arch initialization
  • Documentation

PREPROCESS

  • Add directory check in inference in the beginning.
  • Make preprocessing optimal.
  • Clear ram after no_face_filter.
  • Make face coordinates reusable:
    • Saving facial coordinates as .npy file.
    • Alter code to also include eye coordinates.

IMPROVING GAN UPSCALING

  • Merge Data Pipeline with preprocessor:
    • Remove need to recrop, realign and rewarp the image.

IMPROVING WAV2LIP

  • Merge all data Pipeline:
    • Remove the need to recrop, realign, renormalizing etc.
    • Devise a way to keep frames without face in the video.
      • Understand Mels and working of wav2lip model.

OPTIONAL

  • Gradio UI
    • A tab for Video, Audio and Output.
    • A tab for Image, Audio and output.

FURTHER IMPROVEMENTS

  • Inference without restorer
  • Model Improvement
  • Implement no_face_filter too

FUTURE PLANS

  • Face and Audio wise Lipsync using face recognition.
  • A separate tab for TTS.

COLAB NOTEBOOK

  • Optimize Inference.
  • Implement Checks.

πŸ€— ACKNOWLEDGEMENTS:

Thanks to the following open-source projects:

NumPyPyTorchTensorFlow

About

LipWise is a powerful video dubbing tool that leverages optimized inference for Wav2Lip, this also utilizes models like GFPGAN and CodeFormer. These sophisticated models seamlessly integrate the new audio with the lip movements of the reference video, resulting in a stunningly natural and realistic final output.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 77.8%Language:Python 21.8%Language:Shell 0.3%Language:Batchfile 0.2%