Ewald Enzinger's repositories
pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
TransformersSpeechAligner
Long speech to text alignment based on Huggingface Transformers.
agc
Audiogen Codec
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audiossl
A library built for easier audio self-supervised training, downstream tasks evaluation
control-vc
This is the implementation for "ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm"
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
flutter_onnx
ONNX runtime plugin for Flutter
flutter_sherpa_onnx
Flutter plugin wrapping the Sherpa-ONNX runtime
last
A JAX library for building lattice-based speech transducer models
OverFlow
Probabilistic speech syntheses by mixing neural HMM TTS with normalising flows
pyannote-audio_overlapped-speech-detection_cpp
C++ version of pyannote audio overlapped speech detection pipeline
stable-ts
Timestamping Spoken Words
Triton-Puzzles
Puzzles for learning Triton
tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
utut
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
valle
Zero-Shot Text-To-Speech
WhisperKit
Swift native on-device speech recognition with Whisper for Apple Silicon
whisperkittools
Python tools for WhisperKit: Model conversion, optimization and evaluation