Maoshuiyang

followers

following

stars

The Chinese University of Hong Kong

Hong Kong

https://maoshuiyang.github.io/

symao's starred repositories

qa-mdt

OpenMusic: SOTA Text-to-music (TTM) Generation

Language:PythonMIT45700

webdataset

pytorch大规模数据读取dataset

Language:Python1100

webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Language:PythonBSD-3-Clause226600

OmniSenseVoice

Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯

Language:Python61800

MusicGen-colab

Language:Jupyter NotebookUnlicense52100

RSTnet

Real-time Speech-Text Foundation Model Toolkit (wip)

Language:Python11300

FluxMusic

Text-to-Music Generation with Rectified Flow Transformer

700

FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Language:PythonMPL-2.032800

MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Language:Jupyter Notebook36800

zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

Language:Jupyter NotebookMIT290800

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Language:PythonNOASSERTION90000

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT2079700

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT453200

naturalspeech3_facodec

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Language:Python16100

seed-tts-eval

Language:Python99400

WenetSpeechSpeakerCluster

5500

RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Language:PythonNOASSERTION15200

Speech-Editing-Toolkit

It's a repository for implementations of neural speech editing algorithms.

Language:Python18900

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

MIT127500

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.0980500

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonApache-2.01042400

Make-A-Scene

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Language:PythonMIT33300

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonMIT118100

pyllama

LLaMA: Open and Efficient Foundation Language Models

Language:PythonGPL-3.0280700

SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

7400

TTS-TextAnalyzer

TTS Text Analyzer

Apache-2.03100

Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Language:Python34600

lyra

A Very Low-Bitrate Codec for Speech Compression

Language:C++Apache-2.0383000

chinese_speech_pretrain

chinese speech pretrained models

Language:Shell102100