我的AI世界's repositories
crf_torch_onnx
可以转成onnx的torch版本的CRF
3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
bark
🔊 Text-Prompted Generative Audio Model
Bert-VITS2
vits2 backbone with multilingual-bert
ChatTTS
TTS
Chinese-Names-Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
CosyVoice
LLM based TTS model, providing inference/training/deployment full-stack ability.
EasyBertVits2
文章から感情豊かな音声を生成する Bert-VITS2 を簡単に使えます。
espeak-phonemizer
Uses ctypes and libespeak-ng to transform test into IPA phonemes
fish-speech
Brand new TTS solution
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. |语音识别工具包,包含丰富的性能优越的开源预训练模型,支持语音识别、语音端点检测、文本后处理等,具备服务部署能力。
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
leedl-tutorial
《李宏毅深度学习教程》,PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
MassTTS
a TTS demo for training new characters.
megatts2
Unoffical implement of Megatts2
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
parler-tts
Inference and training library for high-quality TTS models.
polyphone
Chinese polyphone disambiguation for Text-to-Speech application
sherpa-onnx
Speech-to-text and text-to-speech using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go
spear-tts-pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
StyleTTS
Official Implementation of StyleTTS
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
tts-frontend-dataset
TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization
Viphoneme
Vi_G2P or ViG2P: G2P package for Vietnamese: based on vPhon and phonology knowledge to convert Raw text - Graphoneme to IPA
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit