Ewald Enzinger's repositories
agc
Audiogen Codec
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
FreeV
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
gazelle-train
Joint speech-language model - respond directly to audio!
istft-onnx
Export an ONNX graph that performs ISTFT. Designed for TTS models.
languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
pyannote-audio_overlapped-speech-detection_cpp
C++ version of pyannote audio overlapped speech detection pipeline
Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Toroidal-PSDA
A probabilistic scoring backend for length-normalized embeddings.
Train_Hifigan_XTTS
This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.
Triton-Puzzles
Puzzles for learning Triton
tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
voxangeles
VoxAngeles Corpus
WhisperKit
Swift native on-device speech recognition with Whisper for Apple Silicon
whisperkittools
Python tools for WhisperKit: Model conversion, optimization and evaluation