sunnnnnnnny's repositories
open-musiclm
Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.
AdaSpeech
An implementation of Microsoft's "AdaSpeech: Adaptive Text to Speech for Custom Voice"
Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works (such as Music Synthesis, Automatic Music Transcription, Automatic MOS Prediction, SSL-based ASR...etc).
CDFSE_FastSpeech2
The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”
Chinese-FastSpeech2
基于标贝数据继续训练,同时对原本的FastSpeech2模型做了改进,引入了韵律表征以及韵律预测模块,使中文发音更生动且富有节奏
text
my first
chinese_speech_pretrain
chinese speech pretrained models
DALL-E
PyTorch package for the discrete VAE used for DALL·E.
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder archi
espnet_onnx
Onnx wrapper for espnet infrernce model
g2p
g2p: English Grapheme To Phoneme Conversion
jets
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
paper-reading
深度学习经典、新论文逐段精读
polyphone
Chinese polyphone disambiguation for Text-to-Speech application
riffusion
Stable diffusion for real-time music generation
tacotron2
Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2
tacotron2-emo
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow
tacotron2-nvidia
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
VISinger2
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
w2v2-how-to
How to use our public wav2vec2 dimensional emotion model