sunnnnnnnny's repositories
XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
Tacotron2-PyTorch
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
DiffGAN-TTS
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
ltu
Github Repo for Paper "Listen, Think, and Understand".
ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
lora-svc
singing voice change based on whisper, and lora for singing voice clone
CLAP
Learning audio concepts from natural language supervision
bark
🔊 Text-Prompted Generative Audio Model
melgan-neurips
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
g2pW_chinese
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
StarGAN-Voice-Conversion-2
A pytorch implementation of StarGAN-VC2
pytorch-StarGAN-VC
Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .
polyphone-g2pL
The implementation of g2pL with a new open dataset.
INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
TTS-TextAnalyzer
TTS Text Analyzer
SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
FastSpeech_sing
The Implementation of FastSpeech based on pytorch.