Ewald Enzinger's repositories
AudioDiffuser
Companion codebase for the paper "A Review on Score-based Generative Models for Audio Applications" (https://arxiv.org/abs/2506.08457)
bournemouth-forced-aligner
Extract phoneme-level timestamps from speeh audio. MFA alternative. work in progress
CapSpeech
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
chatterbox
SoTA open-source TTS
contexless-phonemes-CUPE
pytorch model for contexless-phoneme prediction from speech audio
delayed-streams-modeling
Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.
Diffusion-Speech-Tokenizer
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling" https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf
DiFlow-TTS
DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-to-Speech
EZ-VC
Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion [EMNLP 2025 Findings]
Flamed-TTS
This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in speech synthesis.
HH-Codec
[ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling
hnet
H-Net: Hierarchical Network with Dynamic Chunking
index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
InfiniteTalk
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
KittenTTS
State-of-the-art TTS model under 25MB 😻
learnable-speech
This repo is text to speech with learnable audio encoder without alignment with transcript reference
Marco-Voice
A Unified Framework for Expressive Speech Synthesis with Voice Cloning
OpenReader-WebUI
Web EPUB and PDF text to speech document reader. Read documents in realtime with high-quality TTS; or extract audiobooks. Use your own Kokoro TTS API or Open AI API endpoint.
rwkv-tts-rs
RWKV-based Text-to-Speech implementation in Rust
S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
senko
Very fast speaker diarization
tts
Inworld TTS
UniAudio2
The open-source code of UniAudio2.0
unmute
Make text LLMs listen and speak
zipa
A family of efficient speech models for multilingual phone recognition