tsaifangsheng's repositories
AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
AuxiliaryASR
Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
bddm
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis
bark
🔊 Text-Prompted Generative Audio Model
CDiffuSE
Conditional Diffusion Probabilistic Model for Speech Enhancement
Comprehensive-Transformer-TTS
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
ControlNet
Let us control diffusion models!
diffsptk
A differential version of SPTK
diffusion_distiller
🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"
FastDiff
PyTorch Implementation of FastDiff (IJCAI'22)
GeneFace
GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
google-research
Google Research
GST-Tacotron
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
iSTFTNet-pytorch
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
MSMC-TTS
Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
NeuralSVB
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
nix-tts
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation
nnsvs
Neural network-based singing voice synthesis library for research
PitchExtractor
Deep Neural Pitch Extractor for Voice Conversion and TTS Training
ProDiff
PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline
self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
stable-diffusion
A latent text-to-image diffusion model
StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
state-spaces
Sequence Modeling with Structured State Spaces
StyleFlow
StyleFlow: Attribute-conditioned Exploration of StyleGAN-generated Images using Conditional Continuous Normalizing Flows (ACM TOG 2021)
valle
Zero-Shot Text-To-Speech