Xiaomin Tang's repositories
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
CycleGAN-VC2
Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2
dpss-exp3-VC-PPG
Voice Conversion Experiments for THUHCSI Course : <Digital Processing of Speech Signals>
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
efficient_tts
Pytorch implementation of "Efficienttts: an efficient and high-quality text-to-speech architecture"
isobar
A Python library for creating and manipulating musical patterns, designed for use in algorithmic composition, generative music and sonification. Can be used to generate MIDI events, MIDI files, OSC messages, or custom events.
malaya-speech
Speech Toolkit for bahasa Malaysia, https://malaya-speech.readthedocs.io/
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
OMGD
Online Multi-Granularity Distillation for GAN Compression (ICCV2021)
OSM-one-shot-multispeaker
Framework for one-shot multispeaker system based on Deep Learning
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
reinforcement-learning-an-introduction
Python Implementation of Reinforcement Learning: An Introduction
stargan
StarGAN - Official PyTorch Implementation (CVPR 2018)
StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
StreamingCNN
To train deep convolutional neural networks, the input data and the activations need to be kept in memory. Given the limited memory available in current GPUs, this limits the maximum dimensions of the input data. Here we demonstrate a method to train convolutional neural networks while holding only parts of the image in memory.
VAENAR-TTS
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
VQMIVC
Official implementation of VQMIVC: One-shot Voice Conversion @ Interspeech 2021