symao's starred repositories
webdataset
pytorch大规模数据读取dataset
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
OmniSenseVoice
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Speech-Editing-Toolkit
It's a repository for implementations of neural speech editing algorithms.
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Make-A-Scene
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
TTS-TextAnalyzer
TTS Text Analyzer
Text-to-sound-Synthesis
The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
chinese_speech_pretrain
chinese speech pretrained models