MMMMichaelzhang's repositories
StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
assem-vc
Official Code for Assem-VC @ICASSP2022
asteroid
The PyTorch-based audio source separation toolkit for researchers || Pretrained models available
AudioMass
Free full-featured web-based audio & waveform editing tool
AuxiliaryASR
Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)
bark
🔊 Text-Prompted Generative Audio Model
CMGAN
Conformer-based Metric GAN for speech enhancement
Cross-Lingual-Voice-Cloning
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
DeepFilterNet
Noise supression using deep filtering
demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
FACIAL
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.
fastVC
A simple voice conversion tool
FullSubNet-plus
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
matchering
🎚️ Open Source Audio Matching and Mastering
mellotron
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
mir-svc
Unsupervised WaveNet-based Singing Voice Conversion Using Pitch Augmentation and Two-phase Approach
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
NeuralSVB
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code
nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC
ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
PitchExtractor
Deep Neural Pitch Extractor for Voice Conversion and TTS Training
Retrieval-based-Voice-Conversion-WebUI
Voice data <= 10 mins can also be used to train a good VC model!
so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
ssr_eval
Evaluation and Benchmarking of Speech Super-resolution Methods
StyleTTS
Official Implementation of StyleTTS
svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.