PrashantDixit0 / AI-Avatar-RealTime

Talking AI Avatar in Realtime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation

Open In Colab

1. Installation.

Linux/Unix

  1. Install Anaconda, Python and git.

  2. Creating the env and install the requirements.

git clone https://github.com/OpenTalker/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### Coqui TTS is optional for gradio demo. 
### pip install TTS

Windows

A video tutorial in chinese is available here. You can also follow the following instructions:

  1. Install Python 3.8 and check "Add Python to PATH".
  2. Install git manually or using Scoop: scoop install git.
  3. Install ffmpeg, following this tutorial or using scoop: scoop install ffmpeg.
  4. Download the SadTalker repository by running git clone https://github.com/Winfredy/SadTalker.git.
  5. Download the checkpoints and gfpgan models in the downloads section.
  6. Run start.bat from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.

macOS

A tutorial on installing SadTalker on macOS can be found here.

Docker, WSL, etc

Please check out additional tutorials here.

2. Download Models

You can run the following script on Linux/macOS to automatically download all the models:

bash scripts/download_models.sh

We also provide an offline patch (gfpgan/), so no model will be downloaded when generating.

Pre-Trained Models

GFPGAN Offline Patch

Model Details

Model explains:

New version
Model Description
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/SadTalker_V0.0.2_256.safetensors packaged sadtalker checkpoints of old version, 256 face render).
checkpoints/SadTalker_V0.0.2_512.safetensors packaged sadtalker checkpoints of old version, 512 face render).
gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.
Old version
Model Description
checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb.
checkpoints/BFM 3DMM library file.
checkpoints/hub Face detection models used in face alignment.
gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.

The final folder will be shown as:

image

3. Quick Start

Please read our document on best practices and configuration tips

WebUI Demos

Online Demo: HuggingFace | SDWebUI-Colab | Colab

Local WebUI extension: Please refer to WebUI docs.

Local gradio demo (recommanded): A Gradio instance similar to our Hugging Face demo can be run locally:

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app_sadtalker.py

You can also start it more easily:

  • windows: just double click webui.bat, the requirements will be installed automatically.
  • Linux/Mac OS: run bash webui.sh to start the webui.

CLI usage

Animating a portrait image from default config:
python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --enhancer gfpgan 

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

Full body/image Generation:

Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --preprocess full \
                    --enhancer gfpgan 

More examples and configuration and tips can be founded in the >>> best practice documents <<<.

Citations

We also use the following 3rd-party libraries:

About

Talking AI Avatar in Realtime

License:Other


Languages

Language:Jupyter Notebook 85.9%Language:Python 13.9%Language:Shell 0.2%Language:Batchfile 0.0%