我的AI世界's repositories
3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Bert-VITS2
vits2 backbone with multilingual-bert
ChatTTS
TTS
EasyBertVits2
文章から感情豊かな音声を生成する Bert-VITS2 を簡単に使えます。
espeak-phonemizer
Uses ctypes and libespeak-ng to transform test into IPA phonemes
fish-speech
Brand new TTS solution
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. |语音识别工具包,包含丰富的性能优越的开源预训练模型,支持语音识别、语音端点检测、文本后处理等,具备服务部署能力。
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
leedl-tutorial
《李宏毅深度学习教程》,PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
MassTTS
a TTS demo for training new characters.
megatts2
Unoffical implement of Megatts2
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
parler-tts
Inference and training library for high-quality TTS models.
sherpa-onnx
Speech-to-text and text-to-speech using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go
spear-tts-pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
StyleTTS
Official Implementation of StyleTTS
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
tensorflow-wavenet
A TensorFlow implementation of DeepMind's WaveNet paper
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
vall-e_
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
VALOR
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
vits2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit