Yeongtae's starred repositories
SenseVoice
Multilingual Voice Understanding Model
matmulfreellm
Implementation for MatMul-free LM.
Speech2RIR
This is the official implementation of reverberant speech to room impulse response estimator
Diff-HierVC
Official Pytorch Implementation of "Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation"
TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
AI-For-Beginners
12 Weeks, 24 Lessons, AI for All!
pyannote-metrics
A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
voxconverse
Spot the conversation: speaker diarisation in the wild
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
TTSDatasetRecorder
A simple app for recording speech datasets.
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
metavoice-src
Foundational model for human-like, expressive TTS
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
clone-voice
A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频
resemble-enhance
AI powered speech denoising and enhancement
Automatic-Prosody-Annotator-with-SSWP-CLAP
An automatic prosodic boundary annotation tool for Text-to-Speech Synthesis (TTS).