Anzi-xbb / zhrtvc

Zhongwen real time voice cloning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

zhrtvc

Chinese Real Time Voice Cloning

版本

v1.1.2

详见readme

目录介绍

zhrtvc

代码,包括encoder、synthesizer、vocoder、toolbox模块,包括模型训练的模块和可视化合成语音的模块。

执行脚本需要进入zhrtvc目录操作。

代码相关的说明详见zhrtvc目录下的readme文件。

models

预训练的模型,包括encoder、synthesizer、vocoder的模型。

预训练的模型在百度网盘下载,下载后解压,替换models文件夹即可。

链接:https://pan.baidu.com/s/14hmJW7sY5PYYcCFAbqV0Kw

提取码:zl9i

data

语料样例,包括语音和文本对齐语料,处理好的用于训练synthesizer的数据样例。

可以直接执行synthesizer_preprocess_audio.pysynthesizer_preprocess_embeds.py把samples的语音文本对齐语料转为SV2TTS的用于训练synthesizer的数据。

语料样例在百度网盘下载,下载后解压,替换data文件夹即可。

链接:https://pan.baidu.com/s/1Q_WUrmb7MW_6zQSPqhX9Vw

提取码:bivr

Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious or if you're looking for info I haven't documented yet (don't hesitate to make an issue for that too). Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Papers implemented

URL Designation Title Implementation source
1806.04558 SV2TTS Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis This repo
1802.08435 WaveRNN (vocoder) Efficient Neural Audio Synthesis fatchord/WaveRNN
1712.05884 Tacotron 2 (synthesizer) Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions Rayhane-mamah/Tacotron-2
1710.10467 GE2E (encoder) Generalized End-To-End Loss for Speaker Verification This repo

About

Zhongwen real time voice cloning


Languages

Language:Python 99.9%Language:Shell 0.1%