Лэюань 's starred repositories
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
metavoice-src
Foundational model for human-like, expressive TTS
emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
python-audio-separator
Easy to use vocal separation from CLI or as a python package, using a variety of amazing models (primarily trained by @Anjok07 as part of UVR)
libriheavy
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
supervoice
VoiceBox neural network implementation
OpenPhonemizer
Permissively licensed, open sourced, local IPA Phonemizer (G2P) powered by deep learning.
TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
DTTNet-Pytorch
An official implementation of the ICASSP 2024 paper: Dual-Path TFC-TDF UNet for Music Source Separation
pflow-encodec
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
X-E-Speech-code
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
LangSegment
It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。
FlashSpeech
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Train_Hifigan_XTTS
This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.
speechtoolkit
[EARLY PUBLIC ALPHA] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activity detection, and more!