Beast code in Giters

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

Language:PythonMIT000

ControlNet

Let us control diffusion models!

Language:PythonApache-2.0000

diffsptk

A differential version of SPTK

Language:PythonApache-2.0000

diffusion_distiller

🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"

Language:PythonMIT000

FastDiff

PyTorch Implementation of FastDiff (IJCAI'22)

Language:Python000

GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

Language:PythonMIT000

google-research

Google Research

Apache-2.0000

GST-Tacotron

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

MIT000

iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Language:PythonApache-2.0000

MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Apache-2.0000

MQTTS

Language:PythonMIT000

MSMC-TTS

Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS

Language:PythonMIT000

NeuralSVB

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Language:Python000

nix-tts

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

MIT000

nnsvs

Neural network-based singing voice synthesis library for research

MIT000

PitchExtractor

Deep Neural Pitch Extractor for Voice Conversion and TTS Training

MIT000

ProDiff

PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline

Language:PythonMIT000

self-supervised-phone-segmentation

Phoneme segmentation using pre-trained speech models

Language:PythonGPL-3.0000

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookNOASSERTION000

StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

MIT000

state-spaces

Sequence Modeling with Structured State Spaces

Apache-2.0000

StyleFlow

StyleFlow: Attribute-conditioned Exploration of StyleGAN-generated Images using Conditional Continuous Normalizing Flows (ACM TOG 2021)

Language:Python000

UUVC

Language:Python000

valle

Zero-Shot Text-To-Speech

Language:PythonApache-2.0000