Maoshuiyang

followers

following

stars

The Chinese University of Hong Kong

Hong Kong

symao's starred repositories

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT2044400

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT438400

naturalspeech3_facodec

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Language:Python13800

seed-tts-eval

Language:Python87500

WenetSpeechSpeakerCluster

5400

RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Language:PythonNOASSERTION13300

Speech-Editing-Toolkit

It's a repository for implementations of neural speech editing algorithms.

Language:Python17800

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

MIT125600

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.0901500

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonApache-2.0940000

Make-A-Scene

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Language:PythonMIT32900

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonMIT113200

pyllama

LLaMA: Open and Efficient Foundation Language Models

Language:PythonGPL-3.0280100

SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

7200

TTS-TextAnalyzer

TTS Text Analyzer

Apache-2.03200

Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Language:Python34300

lyra

A Very Low-Bitrate Codec for Speech Compression

Language:C++Apache-2.0380600

chinese_speech_pretrain

chinese speech pretrained models

Language:Shell99100

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonMIT73500

VQ-Diffusion

Language:PythonMIT42700

g2p-kd

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion

Language:PythonNOASSERTION2000

phonemizer

Simple text to phones converter for multiple languages

Language:PythonGPL-3.0117500

SoundStorm

The reproduced code for Google's SoundStorm

Language:Python23500

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:Python55000

NaturalSpeech2

3200

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

MIT54800

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonMIT125000

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Language:PythonApache-2.0197100

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT3488400