symao (Maoshuiyang)

Maoshuiyang

Geek Repo

Company:The Chinese University of Hong Kong

Location:Hong Kong

Github PK Tool:Github PK Tool

symao's starred repositories

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Stargazers:418Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:20444Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4384Issues:0Issues:0

naturalspeech3_facodec

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Language:PythonStargazers:138Issues:0Issues:0
Language:PythonStargazers:875Issues:0Issues:0

RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Language:PythonLicense:NOASSERTIONStargazers:133Issues:0Issues:0

Speech-Editing-Toolkit

It's a repository for implementations of neural speech editing algorithms.

Language:PythonStargazers:178Issues:0Issues:0

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

License:MITStargazers:1256Issues:0Issues:0

trl

Train transformer language models with reinforcement learning.

Language:PythonLicense:Apache-2.0Stargazers:9015Issues:0Issues:0

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonLicense:Apache-2.0Stargazers:9400Issues:0Issues:0

Make-A-Scene

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Language:PythonLicense:MITStargazers:329Issues:0Issues:0

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonLicense:MITStargazers:1132Issues:0Issues:0

pyllama

LLaMA: Open and Efficient Foundation Language Models

Language:PythonLicense:GPL-3.0Stargazers:2801Issues:0Issues:0

SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

Stargazers:72Issues:0Issues:0

TTS-TextAnalyzer

TTS Text Analyzer

License:Apache-2.0Stargazers:32Issues:0Issues:0

Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Language:PythonStargazers:343Issues:0Issues:0

lyra

A Very Low-Bitrate Codec for Speech Compression

Language:C++License:Apache-2.0Stargazers:3806Issues:0Issues:0

chinese_speech_pretrain

chinese speech pretrained models

Language:ShellStargazers:991Issues:0Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonLicense:MITStargazers:735Issues:0Issues:0
Language:PythonLicense:MITStargazers:427Issues:0Issues:0

g2p-kd

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion

Language:PythonLicense:NOASSERTIONStargazers:20Issues:0Issues:0

phonemizer

Simple text to phones converter for multiple languages

Language:PythonLicense:GPL-3.0Stargazers:1175Issues:0Issues:0

SoundStorm

The reproduced code for Google's SoundStorm

Language:PythonStargazers:235Issues:0Issues:0

AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Language:PythonStargazers:550Issues:0Issues:0

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

License:MITStargazers:548Issues:0Issues:0

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Language:PythonLicense:MITStargazers:1250Issues:0Issues:0

vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Language:PythonLicense:Apache-2.0Stargazers:1971Issues:0Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:34884Issues:0Issues:0