sunnnnnnnny's repositories
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
ChatTTS
ChatTTS is a generative speech model for daily dialogue.
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
megatts2
Unoffical implementation of Megatts2
emotion2vec
Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
NS2VC
Unofficial implementation of NaturalSpeech2 for Voice Conversion and Text to Speech
fish-speech
Brand new TTS solution
Transformer-TTS
A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
OpenVoice
Instant voice cloning by MyShell
hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
stable-diffusion
A latent text-to-image diffusion model
tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
StyleTTS
Official Implementation of StyleTTS
HierSpeechpp
The official implementation of HierSpeech++
versatile_audio_super_resolution
Versatile audio super resolution (any -> 48kHz) with AudioSR.
lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
AuxiliaryASR
Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
PitchExtractor
Deep Neural Pitch Extractor for Voice Conversion and TTS Training
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
BBDown
Bilibili Downloader. 一款命令行式哔哩哔哩下载器.
XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
Tacotron2-PyTorch
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.