positivewon

followers

following

stars

Suwon Yang's repositories

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT000

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT000

OpenVoice

Instant voice cloning by MyShell

Language:PythonMIT000

awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

MIT000

Bert-VITS2

vits2 backbone with bert

Language:PythonAGPL-3.0000

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Language:PythonMIT000

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.0000

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonApache-2.0000

EmoSphere-TTS

The official implementation of EmoSphere-TTS

Language:Python000

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:Python000

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Apache-2.0000

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT000

FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

NOASSERTION000

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.

Language:PythonNOASSERTION000

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

MIT000

instruct-MusicGen

The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".

Language:PythonApache-2.0000

langchain

⚡ Building applications with LLMs through composability ⚡

Language:Jupyter NotebookMIT000

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonMIT000

metavoice-src

Foundational model for human-like, expressive TTS

Language:PythonApache-2.0000

mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

MIT000

NeMo

NeMo: a toolkit for conversational AI

Language:PythonApache-2.0000

parler-tts

Inference and training library for high-quality TTS models.

Language:PythonApache-2.0000

PeriodWave

The official Implementation of PeriodWave and PeriodWave-Turbo

MIT000

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language:PythonMIT000

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonApache-2.0000

StyleTTS

Official Implementation of StyleTTS

Language:PythonMIT000

t5x

Language:PythonApache-2.0000

WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

MIT000

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonApache-2.0000

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Language:PythonMIT000