Jeru2023/ai_twins

AI Twins

Perfect AI Replication

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

This project aims to achieve two goals:

Modularization of functionalities from state-of-the-art projects in Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Wav2Lip, allowing independent usage of each module.

The project focuses on extracting the functionalities of leading projects in the field of ASR, TTS, and Wav2Lip and making them available as individual modules. By doing so, developers can easily integrate these modules into their applications as standalone components, without the need for extensive modifications or dependencies on the entire project.

Integration of multiple projects into a comprehensive end-to-end solution.

This project aims to combine the functionalities of various projects into a unified and seamless solution. By integrating ASR, TTS, and Wav2Lip capabilities, it provides a comprehensive pipeline that covers the entire process of converting text to synthesized speech with lip movement synchronization. This end-to-end solution simplifies the overall workflow and allows users to achieve high-quality results without the hassle of manually coordinating multiple projects.

By offering both independent modules and an integrated solution, this project provides flexibility and convenience for developers and researchers working in the field of digital humans. It enables them to leverage the best features from various projects and seamlessly integrate them into their own applications or research work.

(back to top)

Getting Started

This project uses GPT-SoVITS as submodule for voice clone, please follow the installation guideline from GPT-SoVITS to setup environment.

Please mind I've rename GPT-SoVITS by removing dash, as it's invalid character as pacakge name.

git submodule add git@github.com:RVC-Boss/GPT-SoVITS.git GPTSoVITS

For high quality wav2lip, I also cloned two submodules from Wav2Lip-GFPGAN

git submodule add git@github.com:ajay-sainy/Wav2Lip-GFPGAN.git Wav2Lip-GFPGAN
ln -s Wav2Lip-GFPGAN/Wav2Lip-master ./Wav2Lip
ln -s Wav2Lip-GFPGAN/GFPGAN-master ./GFPGAN
git add Wav2Lip
git add GFPGAN

Prerequisites

Install ffmpeg
Pretrain Models

Download pretrained models from GPT-SoVITS Models and place them in GPTSoVITS/GPT_SoVITS/pretrained_models

Installation

Create independent Python Environment.

 conda create -n ai_twins python=3.9
 conda activate ai_twins

Clone the repo

git clone https://github.com/Jeru2023/ai_twins.git

Install packages

install -r requirements.txt
install -r GPTSoVITS/requirements.txt

Overwrite config.py under GPTSoVITS

cd ai_twins
cp config-GPTSoVITS.py GPTSoVITS/config.py

To run test or WebUI Demo, create three empty folders in output: slice_trunks, tts, upload

(back to top)

Usage

Function call

Please refer to the test files in test folder.

TTS

from infer.tts_model import TTSModel
import utils
from infer.persona_enum import PersonaEnum
import os


tts_model = TTSModel()
root_path = utils.get_root_path()

persona_name = PersonaEnum.NORMAL_FEMALE.get_name()
text = '今天天气不错呀，我真的太开心了。'
uuid = utils.generate_unique_id(text)

# en for english
text_language = 'zh'
output_path = os.path.join(root_path, 'output', 'tts', f'{uuid}.wav')

tts_model.inference(persona_name, text, text_language, output_path)

ASR

import utils
import os
from infer.asr_model import ASRModel

asr_model = ASRModel()

audio_sample = os.path.join(utils.get_root_path(), 'data', 'audio', 'sample_short.wav')
output = asr_model.inference(audio_sample)
print(output)

Audio Slice

import utils
from tools import slice_tool

in_path = utils.get_root_path() + '/data/audio/sample_long.wav'
out_folder = utils.get_root_path() + '/output/slice_trunks'
out_file_prefix = 'sample'
slice_tool.slice_audio(in_path, out_folder, out_file_prefix, threshold=-40)

Web Demo

ASR Toolkit Demo

  python asr_toolkit_webui.py

ASR

Audio Slicer

TTS Toolkit Demo

ASR Toolkit Demo

  python tts_toolkit_webui.py

(back to top)

Roadmap

[*] Toolkit - ASR
[*] Toolkit - Audio Slicer
Toolkit - Vocal Separation
[*] Toolkit - Zero-shot TTS
[*] ASR Toolkit Demo WebUI
[*] TTS Toolkit Demo WebUI
wav2lip integration
GFPGAN enhancement
integration Demo WebUI

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!