Suwon Yang's repositories
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
OpenVoice
Instant voice cloning by MyShell
awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Bert-VITS2
vits2 backbone with bert
BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
ChatTTS
A generative speech model for daily dialogue.
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
EmoSphere-TTS
The official implementation of EmoSphere-TTS
emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
instruct-MusicGen
The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
langchain
⚡ Building applications with LLMs through composability ⚡
MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
metavoice-src
Foundational model for human-like, expressive TTS
mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
NeMo
NeMo: a toolkit for conversational AI
parler-tts
Inference and training library for high-quality TTS models.
PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
speechbrain
A PyTorch-based Speech Toolkit
StyleTTS
Official Implementation of StyleTTS
WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)