Xiaomin Tang's repositories
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
CycleGAN-VC2
Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2
dpss-exp3-VC-PPG
Voice Conversion Experiments for THUHCSI Course : <Digital Processing of Speech Signals>
efficient_tts
Pytorch implementation of "Efficienttts: an efficient and high-quality text-to-speech architecture"
malaya-speech
Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
OSM-one-shot-multispeaker
Framework for one-shot multispeaker system based on Deep Learning
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
reinforcement-learning-an-introduction
Python Implementation of Reinforcement Learning: An Introduction
StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
StreamingCNN
To train deep convolutional neural networks, the input data and the activations need to be kept in memory. Given the limited memory available in current GPUs, this limits the maximum dimensions of the input data. Here we demonstrate a method to train convolutional neural networks while holding only parts of the image in memory.
VAENAR-TTS
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.