speech-synthesis

There are 140 repositories under speech-synthesis topic.

coqui-ai / TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
python text-to-speech deep-learning speech pytorch tts vocoder tacotron glow-tts melgan speaker-encoder hifigan speaker-encodings multi-speaker-tts tts-model speech-synthesis voice-cloning voice-synthesis voice-conversion
Language:Python 42605
leon
leon-ai / leon
🧠 Leon is your open-source personal assistant.
ai ai-assistant artificial-intelligence assistant automation bot chatbot flite leon nodejs offline personal-assistant privacy python speech-recognition speech-synthesis speech-to-text text-to-speech virtual-assistant voice-assistant
Language:TypeScript 16648
NVIDIA-NeMo / NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
machine-translation speaker-recognition asr tts generative-ai multimodal deeplearning neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models
Language:Python 15699
NVIDIA / DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
computer-vision deep-learning drug-discovery forecasting large-language-models mxnet paddlepaddle pytorch recommender-systems speech-recognition speech-synthesis tensorflow tensorflow2 translation nlp
Language:Jupyter Notebook 14484
PaddlePaddle / PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr kws speech-recognition sound-classification voice-cloning vocoder voice-recognition self-supervised-learning wav2vec2 whisper code-switch
Language:Python 12228
rhasspy / piper
A fast, local neural text to speech system
speech-synthesis text-to-speech tts
Language:C++ 10007
espnet / espnet
End-to-End Speech Processing Toolkit
deep-learning end-to-end chainer pytorch kaldi speech-recognition speech-synthesis speech-translation machine-translation voice-conversion speech-enhancement speech-separation singing-voice-synthesis speaker-diarization spoken-language-understanding text-to-speech
Language:Python 9460
Amphion
open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audio-generation audio-synthesis audioldm music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e voice-conversion audit fastspeech2 vits emilia maskgct vocoder
Language:Python 9383
voicepaw / so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
sovits vits voice-conversion so-vits-svc hubert softvc realtime voice-changer deep-learning pytorch speech-synthesis contentvec gan lightning pytorch-lightning hacktoberfest
Language:Python 9119
rany2 / edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
tts speech-synthesis text-to-speech
Language:Python 9061
netease-youdao / EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
pytorch speech speech-synthesis tts multi-speaker text-to-speech deep-learning prompt emotivoice ai python emotion style
Language:Python 8322
jaywalnut310 / vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
tts text-to-speech pytorch deep-learning speech-synthesis
Language:Python 7679
yl4579 / StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
deep-learning pytorch speaker-adaptation speech-synthesis text-to-speech tts wavlm diffusion-models latent-diffusion latent-diffusion-models adversarial-training gan
Language:Python 5962
espeak-ng / espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
espeak-ng espeak android text-to-speech speech-synthesis
Language:C 5564
snakers4 / silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark pytorch colab onnx torch-hub text-to-speech tts-models speech speech-synthesis tts repunctuation capitalization
Language:Jupyter Notebook 5481
voice-pro
abus-aikorea / voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
audiobook faster-whisper gradio karaoke podcasts speech-recognition speech-synthesis speech-to-text subtitles text-to-speech transcription translator tts voice-cloning voice-conversion webui whisper whisperx yt-dlp
Language:Python 4808
MoonInTheRiver / DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
text-to-speech diffusion-speedup tts aaai2022 singing-synthesis diffusion-model speech-synthesis singing-voice-synthesis singing-voice singing-voice-database midi
Language:Python 4616
WhisperSpeech / WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
pytorch speech-synthesis tts
Language:Jupyter Notebook 4362
huggingface / speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
ai assistant language-model machine-learning python speech speech-synthesis speech-to-text speech-translation
Language:Python 4180
metavoiceio / metavoice-src
Foundational model for human-like, expressive TTS
text-to-speech ai deep-learning pytorch speech speech-synthesis tts voice-clone zero-shot-tts
Language:Python 4159
TensorSpeech / TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
speech-synthesis text-to-speech tensorflow2 melgan fastspeech real-time tts vocoder multi-speaker-tts fastspeech2 multiband-melgan tacotron2 parallel-wavegan tflite mobile-tts zh-tts chinese-tts korea-tts german-tts japanese-tts
Language:Python 3975
denizsafak / abogen
Generate audiobooks from EPUBs, PDFs and text with synchronized captions.
audiobook audiobooks content-creation content-creator epub-converter kokoro kokoro-82m kokoro-tts media-generation narrator speech-synthesis subtitles text-to-audio text-to-speech tts voice-synthesis
Language:Python 3549
KoljaB / RealtimeTTS
Converts text to speech in realtime
python realtime speech-synthesis text-to-speech
Language:Python 3508
OpenUtau
stakira / OpenUtau
Open singing synthesis platform / Open source UTAU successor
utau vocaloid music singing-synthesis singing-voice-synthesis vocal-synthesis speech-synthesis vogen openutau
Language:C# 3125
zzw922cn / awesome-speech-recognition-speech-synthesis-papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
automatic-speech-recognition papers roadmap rnn cnn dnn attention-mechanism seq2seq acoustic-model timit-dataset tts language-model speaker-verification speech-recognition speech-synthesis neural-network recognition-synthesis diffusion-models singing-voice-synthesis voice-conversion
3068
keithito / tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
tacotron tensorflow speech-synthesis python machine-learning tts
Language:Python 2985
tensorflow / lingvo
Lingvo
speech-recognition translation speech-to-text machine-translation mnist seq2seq language-model tts asr lm nlp tensorflow speech research distributed gpu-computing speech-synthesis
Language:Python 2852
Camb-ai / MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
prosody speech speech-synthesis text-to-speech voice-cloneai voice-cloning
Language:Jupyter Notebook 2796
Blaizzy / mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
apple-silicon audio-processing mlx multimodal speech-recognition speech-synthesis speech-to-text text-to-speech transformers
Language:Python 2668
marytts / marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
speech-synthesis tts java text-to-speech
Language:Java 2534
cogentapps / chat-with-gpt
An open-source ChatGPT app with a voice
artificial-intelligence chatgpt chatgpt-api gpt-3 self-hosted chat speech-synthesis gpt-4 llm llms
Language:TypeScript 2366
r9y9 / wavenet_vocoder
WaveNet vocoder
wavenet speech-synthesis speech-processing pytorch python wavenet-vocoder neural-vocoder speech
Language:Python 2366
Rayhane-mamah / Tacotron-2
DeepMind's Tacotron-2 Tensorflow implementation
tacotron tensorflow paper python speech-synthesis text-to-speech wavenet
Language:Python 2313
jik876 / hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
speech-synthesis gan text-to-speech tts deep-learning hifi-gan pytorch vocoder
Language:Python 2222
fatchord / WaveRNN
WaveRNN Vocoder + TTS
wavernn pytorch neural-vocoder speech-synthesis tts tacotron text-to-speech
Language:Python 2165
r9y9 / deepvoice3_pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
end-to-end machine-learning multi-speaker python pytorch speech-processing speech-synthesis tts
Language:Python 1980