a80093119 / Voicecho

Reconstruct your missing people with AI models and voice, also chating with LLM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Voicecho

Reconstruct your missing people with AI models and voice, also chating with LLM

Quick Start

1. Install Requirements

Follow the original repo to test if you got all environment ready. **Python 3.7 or higher ** is needed to run the toolbox.

  • Install PyTorch.
  • Install ffmpeg.
  • Run pip install -r requirements.txt to install the remaining necessary packages.

Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment.

2. Train synthesizer with your dataset

  • Download aidatatang_200zh or SLR68 dataset and unzip: make sure you can access all .wav in train folder

  • Preprocess with the audios and the mel spectrograms: python synthesizer_preprocess_audio.py <datasets_root> Allow parameter --dataset {dataset} to support adatatang_200zh, magicdata

  • Preprocess the embeddings: python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer

  • Train the synthesizer: python synthesizer_train.py mandarin <datasets_root>/SV2TTS/synthesizer

  • Go to next step when you see attention line show and loss meet your need in training folder synthesizer/saved_models/.

FYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps. attention_step_20500_sample_1 step-135500-mel-spectrogram_sample_1 A link to my early trained model: Baidu Yun Code:aid4

3. Launch the Toolbox

You can then try the toolbox:

python demo_toolbox.py -d <datasets_root>
or
python demo_toolbox.py

Good news🤩: Chinese Characters are supported

About

Reconstruct your missing people with AI models and voice, also chating with LLM


Languages

Language:Python 100.0%