Konstantine Goudz's starred repositories

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:29937Issues:426Issues:4173

Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!

Language:PythonLicense:MITStargazers:21803Issues:162Issues:1536

ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

Language:PythonLicense:MITStargazers:16907Issues:153Issues:1206

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:7723Issues:76Issues:152

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

Language:PythonLicense:MITStargazers:7476Issues:82Issues:151

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4339Issues:58Issues:141

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonLicense:MITStargazers:3602Issues:47Issues:173

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonLicense:MITStargazers:3338Issues:58Issues:70

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonLicense:MITStargazers:2338Issues:60Issues:167

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Language:PythonLicense:Apache-2.0Stargazers:1960Issues:50Issues:126

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonLicense:MITStargazers:1245Issues:54Issues:31

AVeryComfyNerd

ComfyUI related stuff and things

License:MITStargazers:1157Issues:40Issues:0

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonLicense:MITStargazers:1143Issues:57Issues:50

papermage

library supporting NLP and CV research on scientific papers

Language:PythonLicense:Apache-2.0Stargazers:654Issues:9Issues:31

bark-voice-cloning-HuBERT-quantizer

The code for the bark-voicecloning model. Training and inference.

Language:PythonLicense:MITStargazers:621Issues:17Issues:42

voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Language:PythonLicense:MITStargazers:569Issues:50Issues:25

HD-Painter

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Language:PythonLicense:MITStargazers:261Issues:21Issues:19

spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch

Language:PythonLicense:MITStargazers:249Issues:28Issues:6

EfficientWord-Net

OneShot Learning-based hotword detection.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:215Issues:12Issues:36

CoMoSpeech

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Language:PythonLicense:MITStargazers:168Issues:11Issues:11

Bridge-TTS

Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).

UniCATS-CTX-vec2wav

[AAAI 2024] Code for CTX-vec2wav in UniCATS

VALL-E-X-Trainer-by-CustomData

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

Language:PythonLicense:MITStargazers:62Issues:5Issues:0

UniCATS-CTX-txt2vec

[AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS

bark-data-gen

Create training data for training a voice cloner for bark text to speech.

Language:Jupyter NotebookLicense:MITStargazers:44Issues:3Issues:4

audio-diffusion-pytorch-fork

Audio generation using diffusion models, in PyTorch.

Language:PythonLicense:MITStargazers:44Issues:2Issues:0

EDMSound

Codebase and project page for EDMSound

Language:PythonLicense:MITStargazers:28Issues:0Issues:0

bark-with-voice-clone

🔊 Text-prompted Generative Audio Model - With the ability to clone voices

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:20Issues:1Issues:0