entn-at

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling" https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf

000

DiFlow-TTS

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-to-Speech

000

EZ-VC

Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion [EMNLP 2025 Findings]

MIT000

Flamed-TTS

This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in speech synthesis.

000

Hear-Me-Out

MIT000

HH-Codec

[ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

Apache-2.0000

hnet

H-Net: Hierarchical Network with Dynamic Chunking

MIT000

index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Language:PythonApache-2.0000

InfiniteTalk

Unlimited-length talking video generation that supports image-to-video and video-to-video generation

Apache-2.0000

JoyTTS

000

KittenTTS

State-of-the-art TTS model under 25MB 😻

Apache-2.0000

learnable-speech

This repo is text to speech with learnable audio encoder without alignment with transcript reference

000

Marco-Voice

A Unified Framework for Expressive Speech Synthesis with Voice Cloning

Apache-2.0000

moshi

Language:PythonApache-2.0000

OpenReader-WebUI

Web EPUB and PDF text to speech document reader. Read documents in realtime with high-quality TTS; or extract audiobooks. Use your own Kokoro TTS API or Open AI API endpoint.

Language:TypeScriptMIT000