Ewald Enzinger's repositories
faster-whisper
Faster Whisper transcription with CTranslate2
AudioDec
An Open-source Streaming High-fidelity Neural Audio Codec
audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Auto_Tuning_Zeroshot_TTS_and_VC
Official implementation of "Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis", Interspeech 2023
BetaVAE_VC
Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE"
D-TDNN
PyTorch implementation of Densely Connected Time Delay Neural Network
DPHuBERT
INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"
encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
fad_pytorch
Frechet Audio Distance evaluation in PyTorch
fstalign
An efficient OpenFST-based tool for calculating WER and aligning two transcript sequences.
oobleck
open soundstream-ish VAE codecs for downstream neural audio synthesis
pyctcdecode
A fast and lightweight python-based CTC beam search decoder for speech recognition.
riva-asrlib-decoder
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
SAT
Streaming Audiotransformers for online Audio tagging
sequence_align
Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.
SnakeGAN
Please visit https://thuhcsi.github.io/SnakeGAN/
so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
SV_eval_protocols_for_SD
Speaker verification evaluation protocols simulating speaker diarisation
tango
Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
Waveformer
An efficient architecture for real-time target sound extraction.
whisper-finetuning
[WIP] Scripts for fine-tuning Whisper
whisper-punctuator
Zero-shot Punctuation Insertion using Whisper
zm-text-tts
Learning to Speak from Text for Low-Resource TTS