symao (Maoshuiyang)

Maoshuiyang

Geek Repo

Company:The Chinese University of Hong Kong

Location:Hong Kong

Home Page:https://maoshuiyang.github.io/

Github PK Tool:Github PK Tool

symao's starred repositories

qa-mdt

OpenMusic: SOTA Text-to-music (TTM) Generation

Language:PythonLicense:MITStargazers:457Issues:0Issues:0

webdataset

pytorch大规模数据读取dataset

Language:PythonStargazers:11Issues:0Issues:0

webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Language:PythonLicense:BSD-3-ClauseStargazers:2266Issues:0Issues:0

OmniSenseVoice

Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯

Language:PythonStargazers:618Issues:0Issues:0
Language:Jupyter NotebookLicense:UnlicenseStargazers:521Issues:0Issues:0

RSTnet

Real-time Speech-Text Foundation Model Toolkit (wip)

Language:PythonStargazers:113Issues:0Issues:0

FluxMusic

Text-to-Music Generation with Rectified Flow Transformer

Stargazers:7Issues:0Issues:0

FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Language:PythonLicense:MPL-2.0Stargazers:328Issues:0Issues:0

MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Language:Jupyter NotebookStargazers:368Issues:0Issues:0

zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

Language:Jupyter NotebookLicense:MITStargazers:2908Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:12286Issues:0Issues:0

VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Language:PythonLicense:NOASSERTIONStargazers:900Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:20797Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonLicense:MITStargazers:4532Issues:0Issues:0

naturalspeech3_facodec

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Language:PythonStargazers:161Issues:0Issues:0
Language:PythonStargazers:994Issues:0Issues:0

RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Language:PythonLicense:NOASSERTIONStargazers:152Issues:0Issues:0

Speech-Editing-Toolkit

It's a repository for implementations of neural speech editing algorithms.

Language:PythonStargazers:189Issues:0Issues:0

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

License:MITStargazers:1275Issues:0Issues:0

trl

Train transformer language models with reinforcement learning.

Language:PythonLicense:Apache-2.0Stargazers:9805Issues:0Issues:0

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonLicense:Apache-2.0Stargazers:10424Issues:0Issues:0

Make-A-Scene

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Language:PythonLicense:MITStargazers:333Issues:0Issues:0

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonLicense:MITStargazers:1181Issues:0Issues:0

pyllama

LLaMA: Open and Efficient Foundation Language Models

Language:PythonLicense:GPL-3.0Stargazers:2807Issues:0Issues:0

SpeechGen

《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》

Stargazers:74Issues:0Issues:0

TTS-TextAnalyzer

TTS Text Analyzer

License:Apache-2.0Stargazers:31Issues:0Issues:0

Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Language:PythonStargazers:346Issues:0Issues:0

lyra

A Very Low-Bitrate Codec for Speech Compression

Language:C++License:Apache-2.0Stargazers:3830Issues:0Issues:0

chinese_speech_pretrain

chinese speech pretrained models

Language:ShellStargazers:1021Issues:0Issues:0