Suwon Yang's repositories

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

OpenVoice

Instant voice cloning by MyShell

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

License:MITStargazers:0Issues:0Issues:0

Bert-VITS2

vits2 backbone with bert

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:0Issues:0

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

EmoSphere-TTS

The official implementation of EmoSphere-TTS

Language:PythonStargazers:0Issues:0Issues:0

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:PythonStargazers:0Issues:0Issues:0

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

License:Apache-2.0Stargazers:0Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

License:NOASSERTIONStargazers:0Issues:0Issues:0

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

License:MITStargazers:0Issues:0Issues:0

instruct-MusicGen

The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

langchain

⚡ Building applications with LLMs through composability ⚡

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

License:MITStargazers:0Issues:0Issues:0

NeMo

NeMo: a toolkit for conversational AI

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

PeriodWave

The official Implementation of PeriodWave and PeriodWave-Turbo

License:MITStargazers:0Issues:0Issues:0

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

StyleTTS

Official Implementation of StyleTTS

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

License:MITStargazers:0Issues:0Issues:0

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0