macroustc's repositories
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audino
Open source audio annotation tool for humans
AudioLDM2
Text-to-Audio/Music Generation
Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Awesome-Video-Diffusion-Models
[Arxiv] A Survey on Video Diffusion Models
Bert-VITS2
vits2 backbone with bert
DeepLearningSystem
Deep Learning System core principles introduction.
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
fish-speech
Brand new TTS solution
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
llm-paper-daily
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
minisora
The Mini Sora project aims to explore the implementation path and future development direction of Sora.
NISQA
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Open-Sora
Building your own video generation model like OpenAI's Sora
Open-Sora-Plan
This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
piper
A fast, local neural text to speech system
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Speech-Resources
语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
torchcrepe
Pytorch implementation of the CREPE pitch tracker
ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
UniAudio
The Open Source Code of UniAudio
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild