Shan Yang's starred repositories
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
cursorless
Don't let the cursor slow you down
PyTorch-VAE
A Collection of Variational Autoencoders (VAE) in PyTorch.
vocoder-benchmark
A repository for benchmarking neural vocoders by their quality and speed.
Maix-Speech
Maix Speech AI lib, a fast and small speech lib running on embedded devices, including ASR, chat, TTS etc.
WavAugment
A library for speech data augmentation in time-domain
Awesome-Digital-Human
👽 A curated list of resources related to digital human.