Wendong Gan's repositories
allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
audiolm-pytorch
Implementation of AudioLM, a Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
CharsiuG2P
Multilingual G2P in 100 languages
CleanUNet
Official Implementation of CleanUNet in PyTorch
Comprehensive-E2E-TTS
A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS
DDDM-VC
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
epitran
A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
FastDiff
PyTorch Implementation of FastDiff (IJCAI'22)
HiFiplusplus-pytorch
HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement
Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
MiniCPM-V
MiniCPM-Llama3-V 2.5: A GPT-4V Level MLLM on Your Phone
mosst
speech translation
MSMC-TTS
Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
NKF-AEC
Acoustic Echo Cancellation with Nerual Kalman Filtering
nuwave2
NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates @ INTERSPEECH 2022
Prompt-Singer
Implementation of Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt (NAACL'24).
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
SF-Net
The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"
sgmse
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
SiFiGAN
Official implementation of the source-filter HiFiGAN vocoder
so-vits-svc
SoftVC VITS Singing Voice Conversion
Sovits
An implementation of the combination of Soft-VC and VITS
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
StyleTTS
Official Implementation of StyleTTS
T2A
Project page for "T2A: Robust Text-to-Animation" for ICASSP2023
VITS-BigVGAN-SpanPSP-Chinese
基于PyTorch的VITS-BigVGAN的tts中文模型,加入韵律预测模型。
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild