Yeongtae

Official Pytorch Implementation of "Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation"

Language:Python18000

TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Language:PythonApache-2.016200

AI-For-Beginners

12 Weeks, 24 Lessons, AI for All!

Language:Jupyter NotebookMIT3351600

SemantiCodec

Language:HTML3700

diarizers

Language:Python23100

pyannote-metrics

A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems

Language:PythonMIT18200

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

52700

voxconverse

Spot the conversation: speaker diarisation in the wild

11500

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:Jupyter NotebookNOASSERTION729400

champ

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Language:PythonMIT351900

TTSDatasetRecorder

A simple app for recording speech datasets.

Language:Python2500

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT1107400

yt-dlp

A feature-rich command-line audio/video downloader

Language:PythonUnlicense7851000

dust3r

DUSt3R: Geometric 3D Vision Made Easy

Language:PythonNOASSERTION480600

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonApache-2.0359300

DDDM-VC

Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)

Language:Python15700

ar-vits

text to speech using autoregressive transformer and VITS

Language:PythonMIT21600

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT3018900

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Language:PythonApache-2.0408800

TinySAM

Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"

Language:PythonApache-2.038100

clone-voice

A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具，使用你的音色或任意声音来录制音频

Language:PythonNOASSERTION686200

resemble-enhance

AI powered speech denoising and enhancement

Language:PythonMIT115200

OpenVoice

Instant voice cloning by MyShell.

Language:PythonMIT2775400

Automatic-Prosody-Annotator-with-SSWP-CLAP

An automatic prosodic boundary annotation tool for Text-to-Speech Synthesis (TTS).

Language:PythonApache-2.04200