Shan Yang's starred repositories

Language:PythonStargazers:69Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10434Issues:0Issues:0

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Language:PythonLicense:Apache-2.0Stargazers:24231Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:35240Issues:0Issues:0

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookStargazers:10112Issues:0Issues:0

CLAP

Contrastive Language-Audio Pretraining

Language:PythonLicense:CC0-1.0Stargazers:1216Issues:0Issues:0

Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Language:PythonLicense:Apache-2.0Stargazers:1451Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:19984Issues:0Issues:0

fma

FMA: A Dataset For Music Analysis

Language:Jupyter NotebookLicense:MITStargazers:2163Issues:0Issues:0

SpeechGPT

SpeechGPT Series: Speech Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:987Issues:0Issues:0

Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。

Language:PythonLicense:Apache-2.0Stargazers:3939Issues:0Issues:0

MSMC-TTS

Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS

Language:PythonLicense:MITStargazers:157Issues:0Issues:0

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language:PythonLicense:Apache-2.0Stargazers:10388Issues:0Issues:0

s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Language:PythonLicense:Apache-2.0Stargazers:2133Issues:0Issues:0

app

Web metaverse client

Language:JavaScriptLicense:MITStargazers:337Issues:0Issues:0

cursorless

Don't let the cursor slow you down

Language:TypeScriptLicense:MITStargazers:1087Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:163Issues:0Issues:0

PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.

Language:PythonLicense:Apache-2.0Stargazers:6156Issues:0Issues:0

AudioDVP

AudioDVP:Photorealistic Audio-driven Video Portraits

Language:PythonStargazers:295Issues:0Issues:0

visqol

Perceptual Quality Estimator for speech and audio

Language:C++License:Apache-2.0Stargazers:632Issues:0Issues:0

opencpop

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis

Stargazers:207Issues:0Issues:0

vocoder-benchmark

A repository for benchmarking neural vocoders by their quality and speed.

Language:PythonLicense:NOASSERTIONStargazers:194Issues:0Issues:0

ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Language:C++License:NOASSERTIONStargazers:19546Issues:0Issues:0

cargan

Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"

Language:PythonLicense:MITStargazers:180Issues:0Issues:0

Maix-Speech

Maix Speech AI lib, a fast and small speech lib running on embedded devices, including ASR, chat, TTS etc.

Language:PythonLicense:NOASSERTIONStargazers:305Issues:0Issues:0

WavAugment

A library for speech data augmentation in time-domain

Language:PythonLicense:MITStargazers:630Issues:0Issues:0

Awesome-Digital-Human

👽 A curated list of resources related to digital human.

Stargazers:6Issues:0Issues:0

diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Language:PythonLicense:Apache-2.0Stargazers:728Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:255Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:88Issues:0Issues:0