Ewald Enzinger's repositories
Aligners
HMM, CTC, RNN-Transducer, forward-backward algorithm
ARMHuBERT
PyTorch Implementation of ARMHuBERT (INTERSPEECH 2023)
bark-voice-cloning-HuBERT-quantizer
The code for the bark-voicecloning model. Training and inference.
bigvsan
Pytorch implementation of BigVSAN
ddc_onset
Music onset detector from Dance Dance Convolution packaged as a lightweight PyTorch module
kaldi-decoder
Decoders from Kaldi using OpenFst
kaldialign
Python wrappers for Kaldi Levenshtein's distance and alignment code.
knn-vc
Voice Conversion With Just Nearest Neighbors
libriheavy
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
lvc-vc
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Matcha-TTS
🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
MB-iSTFT-VITS2
Application of MB-iSTFT-VITS components to vits2_pytorch
miipher
Unofficial implementation of miipher
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
multipa
Universal multilingual automatic speech transcription into IPA
naturalspeech
A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
NS2VC
Unofficial implementation of NaturalSpeech2 for Voice Conversion
sherpa
Streaming and non-streaming ASR server in Python
SoundStorm
The reproduced code for Google's SoundStorm
spear-tts-pytorch
An unofficial PyTorch implementation of SPEAR-TTS.
SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
text_search
Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup
tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
tract
Tiny, no-nonsense, self contained, Tensorflow and ONNX inference
vits2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design