Jeru2023 / ai_twins

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Contributors Forks Stargazers Issues MIT License


Logo

AI Twins

Perfect AI Replication

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

This project aims to achieve two goals:

Modularization of functionalities from state-of-the-art projects in Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Wav2Lip, allowing independent usage of each module.

The project focuses on extracting the functionalities of leading projects in the field of ASR, TTS, and Wav2Lip and making them available as individual modules. By doing so, developers can easily integrate these modules into their applications as standalone components, without the need for extensive modifications or dependencies on the entire project.

Integration of multiple projects into a comprehensive end-to-end solution.

This project aims to combine the functionalities of various projects into a unified and seamless solution. By integrating ASR, TTS, and Wav2Lip capabilities, it provides a comprehensive pipeline that covers the entire process of converting text to synthesized speech with lip movement synchronization. This end-to-end solution simplifies the overall workflow and allows users to achieve high-quality results without the hassle of manually coordinating multiple projects.

By offering both independent modules and an integrated solution, this project provides flexibility and convenience for developers and researchers working in the field of digital humans. It enables them to leverage the best features from various projects and seamlessly integrate them into their own applications or research work.

(back to top)

Getting Started

This project uses GPT-SoVITS as submodule for voice clone, please follow the installation guideline from GPT-SoVITS to setup environment.

Please mind I've rename GPT-SoVITS by removing dash, as it's invalid character as pacakge name.

git submodule add git@github.com:RVC-Boss/GPT-SoVITS.git GPTSoVITS

For high quality wav2lip, I also cloned two submodules from Wav2Lip-GFPGAN

git submodule add git@github.com:ajay-sainy/Wav2Lip-GFPGAN.git Wav2Lip-GFPGAN
ln -s Wav2Lip-GFPGAN/Wav2Lip-master ./Wav2Lip
ln -s Wav2Lip-GFPGAN/GFPGAN-master ./GFPGAN
git add Wav2Lip
git add GFPGAN

Prerequisites

  1. Install ffmpeg

  2. Pretrain Models

Download pretrained models from GPT-SoVITS Models and place them in GPTSoVITS/GPT_SoVITS/pretrained_models

Installation

  1. Create independent Python Environment.
     conda create -n ai_twins python=3.9
     conda activate ai_twins
  2. Clone the repo
    git clone https://github.com/Jeru2023/ai_twins.git
  3. Install packages
    install -r requirements.txt
    install -r GPTSoVITS/requirements.txt
  4. Overwrite config.py under GPTSoVITS
    cd ai_twins
    cp config-GPTSoVITS.py GPTSoVITS/config.py
  5. To run test or WebUI Demo, create three empty folders in output: slice_trunks, tts, upload

(back to top)

Usage

Function call

Please refer to the test files in test folder.

TTS

from infer.tts_model import TTSModel
import utils
from infer.persona_enum import PersonaEnum
import os


tts_model = TTSModel()
root_path = utils.get_root_path()

persona_name = PersonaEnum.NORMAL_FEMALE.get_name()
text = '今天天气不错呀,我真的太开心了。'
uuid = utils.generate_unique_id(text)

# en for english
text_language = 'zh'
output_path = os.path.join(root_path, 'output', 'tts', f'{uuid}.wav')

tts_model.inference(persona_name, text, text_language, output_path)

ASR

import utils
import os
from infer.asr_model import ASRModel

asr_model = ASRModel()

audio_sample = os.path.join(utils.get_root_path(), 'data', 'audio', 'sample_short.wav')
output = asr_model.inference(audio_sample)
print(output)

Audio Slice

import utils
from tools import slice_tool

in_path = utils.get_root_path() + '/data/audio/sample_long.wav'
out_folder = utils.get_root_path() + '/output/slice_trunks'
out_file_prefix = 'sample'
slice_tool.slice_audio(in_path, out_folder, out_file_prefix, threshold=-40)

Web Demo

ASR Toolkit Demo

  python asr_toolkit_webui.py
ASR

demo

Audio Slicer

demo

TTS Toolkit Demo

ASR Toolkit Demo

  python tts_toolkit_webui.py

demo

(back to top)

Roadmap

  • [*] Toolkit - ASR
  • [*] Toolkit - Audio Slicer
  • Toolkit - Vocal Separation
  • [*] Toolkit - Zero-shot TTS
  • [*] ASR Toolkit Demo WebUI
  • [*] TTS Toolkit Demo WebUI
  • wav2lip integration
  • GFPGAN enhancement
  • integration Demo WebUI

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

## License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Jeru Liu - @Jeru_AGI - jeru.token@gmail.com

Project Link: https://github.com/Jeru2023/ai_twins

(back to top)

Acknowledgments

(back to top)

About

License:MIT License


Languages

Language:Python 100.0%