Aby Louw's repositories
APNet2
Source code of APNet2, a vocoder
audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
audiowmark
Audio Watermarking
ConsistencyVC-voive-conversion
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
convnext_tts
Unofficial implementation of ConvNeXt-TTS powered by lightning and Rye
dectalk
Modern builds for the 90s/00s DECtalk text-to-speech application.
descript-audio-vae
VAE GAN modified from Descript Audio Codec, which replaces the RVQ with VAE
DiscreteSpeechMetrics
Reference-aware automatic speech evaluation toolkit
flet
Flet enables developers to easily build realtime web, mobile and desktop apps in Python. No frontend experience required.
istftnet
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
LipSick
🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮
Matcha-TTS
🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
MB-iSTFT-VITS2
Application of MB-iSTFT-VITS components to vits2_pytorch
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
onnx-simplifier
Simplify your onnx model
pflowtts_pytorch
Unofficial implementation of NVIDIA P-Flow TTS paper
QuickVC-VoiceConversion
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
Real3DPortrait
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code
RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
UniCATS-CTX-vec2wav
Code for CTX-vec2wav in UniCATS
VoiceFlow-TTS
This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
wavmark
AI-based Audio Watermarking Tool
X-E-Speech-code
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
yaml-ui-editor
YAML UI editor application with Git repository storage
ZEST
Zero-Shot Emotion Style Transfer
ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations