There are 425 repositories under text-to-speech topic.
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Instant voice cloning by MIT and MyShell.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Foundational model for human-like, expressive TTS
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
DeepMind's Tacotron-2 Tensorflow implementation
The official Python API for ElevenLabs Text to Speech.
Offline Text To Speech synthesis for python