Aby Louw's repositories

APNet2

Source code of APNet2, a vocoder

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

audiowmark

Audio Watermarking

Language:C++License:GPL-3.0Stargazers:0Issues:0Issues:0

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Language:PythonStargazers:0Issues:0Issues:0

ConsistencyVC-voive-conversion

Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

convnext_tts

Unofficial implementation of ConvNeXt-TTS powered by lightning and Rye

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

dectalk

Modern builds for the 90s/00s DECtalk text-to-speech application.

Language:PostScriptLicense:NOASSERTIONStargazers:0Issues:0Issues:0

descript-audio-vae

VAE GAN modified from Descript Audio Codec, which replaces the RVQ with VAE

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

DiscreteSpeechMetrics

Reference-aware automatic speech evaluation toolkit

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

istft-onnx

Export an ONNX graph that performs ISTFT. Designed for TTS models.

Language:PythonStargazers:0Issues:0Issues:0

LipSick

🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮

Language:PythonStargazers:0Issues:0Issues:0

Matcha-TTS

🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

MB-iSTFT-VITS2

Application of MB-iSTFT-VITS components to vits2_pytorch

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction

Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)

Language:PythonStargazers:0Issues:0Issues:0

onnx-simplifier

Simplify your onnx model

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

pflow-encodec

Implementation of TTS model based on NVIDIA P-Flow TTS Paper

Language:PythonStargazers:0Issues:0Issues:0

pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Real3DPortrait

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code

Language:PythonStargazers:0Issues:0Issues:0

RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

UniCATS-CTX-vec2wav

Code for CTX-vec2wav in UniCATS

Language:PythonStargazers:0Issues:0Issues:0

VoiceFlow-TTS

This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Language:PythonStargazers:0Issues:0Issues:0

wavenext_pytorch

Unofficial implementation of wavenext vocoder

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

wavmark

AI-based Audio Watermarking Tool

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

X-E-Speech-code

X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

ZEST

Zero-Shot Emotion Style Transfer

Language:PythonStargazers:0Issues:0Issues:0

ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Language:CLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0